Elasticsearch-hadoop not discovering all fields in the index

(Dustin Decker) #1

The index i'm querying has over 400 fields, and each document contains a subset of these fields.

When performing a simple count query on a src_ip field, I see get this exception:

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`src_ip`' given input columns: [@timestamp, event_timestamp, geoip, type]; line 1 pos 7;

It only recognizes 4 out of the >400 fields.

I messed with es.read.field.include but found it only works within the input columns in the list above.

How can I perform queries against all of my fields?

(Dustin Decker) #2

I found my error.

When declaring the dataframe, you need to specify <index-name>/<document-type>. I was only specifying the index name.

This is what worked:
val df = sqlContext.read.format(esFormat).options(elasticOptions).load("data-2017.05.21/sometype")

(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.