Best practice elasticsearch index schema for Spark SQL

ebuildy · December 27, 2015, 10:43pm

Hello,

I am using elasticsearch with SparkSQL in order to query my data from Tableau, it's working for very simple index structure, but as soon as I add nested fields I got always exception such as:

Field 'product' not found; typically this occurs with arrays which are not mapped as single value

Hence my question, is here a best practice for define data schema? (just as avoid nested array maybe..) what exactly means this kind of error?

I am using Spark 1.5.2, with ES 2.1 and Hue notebooks:

CREATE TEMPORARY TABLE events_all USING org.elasticsearch.spark.sql OPTIONS (nodes "elasticsearch", path "events/events", read.field.include "event.*");

SELECT COUNT(*) FROM events_all

Will error:

org.elasticsearch.hadoop.EsHadoopIllegalStateException: Field 'product' not found; typically this occurs with arrays which are not mapped as single value
at org.elasticsearch.spark.sql.RowValueReader$class.rowColumns(RowValueReader.scala:33)
at org.elasticsearch.spark.sql.ScalaRowValueReader.rowColumns(ScalaEsRowValueReader.scala:13)
at org.elasticsearch.spark.sql.ScalaRowValueReader.createMap(ScalaEsRowValueReader.scala:49)
at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:645)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:588)
at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:661)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:588)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:383)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:318)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:213)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:186)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:438)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:86)
at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:43)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)

costin · January 9, 2016, 1:06pm

Hi,

Support for field arrays has been introduced two milestone versions ago in ES-Hadoop 2.2 and in rc1 also documented. Can you please review this section of the docs and report back if the issue persists?

Thanks,

Topic		Replies	Views
Field not found; typically this occurs with arrays which are not mapped as single value Elasticsearch es-hadoop	9	6275	July 6, 2017
Spark-sql does not seem to read from a nested schema Elasticsearch es-hadoop	15	7657	July 6, 2017
PySpark fails to read multiple nested levels of Elasticsearch index Elasticsearch es-hadoop	1	1254	November 18, 2017
Elasticsearch-spark - EsHadoopIllegalStateException - field position Elasticsearch es-hadoop	4	1931	May 3, 2019
Field 'app.response.abc' not found; typically this occurs with arrays which are not mapped as single value Elasticsearch es-hadoop	1	476	September 28, 2020

Best practice elasticsearch index schema for Spark SQL

Related topics