Elasticsearch-hadoop not discovering all fields in the index

Dustin_Decker · May 21, 2017, 3:29pm

The index i'm querying has over 400 fields, and each document contains a subset of these fields.

When performing a simple count query on a src_ip field, I see get this exception:

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`src_ip`' given input columns: [@timestamp, event_timestamp, geoip, type]; line 1 pos 7;

It only recognizes 4 out of the >400 fields.

I messed with es.read.field.include but found it only works within the input columns in the list above.

How can I perform queries against all of my fields?

Dustin_Decker · May 21, 2017, 7:03pm

I found my error.

When declaring the dataframe, you need to specify <index-name>/<document-type>. I was only specifying the index name.

This is what worked:
val df = sqlContext.read.format(esFormat).options(elasticOptions).load("data-2017.05.21/sometype")

system · June 18, 2017, 7:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Spark sql not reading all the columns from index Elasticsearch es-hadoop	4	1019	June 1, 2020
Error load as a DataFrame Elasticsearch es-hadoop	6	1754	July 6, 2017
Spark with elasticsearch hadoop Elasticsearch es-hadoop	2	717	July 4, 2017
Spark code to get select firelds from ES Elasticsearch es-hadoop	3	1924	November 1, 2017
Field 'app.response.abc' not found; typically this occurs with arrays which are not mapped as single value Elasticsearch es-hadoop	1	471	September 28, 2020

Elasticsearch-hadoop not discovering all fields in the index

Related topics