How ES schema is determined while reading using hadoop

Sonny_Heer · February 9, 2019, 12:11am

Env:
using this jar: elasticsearch-spark-13_2.10-5.1.1.jar

sqlContext.read.format("org.elasticsearch.spark.sql").option("es.nodes",es_url).option("es.port", "443").option("es.nodes.wan.only", "true").option("es.net.ssl", "true").option("es.read.field.as.array.include",array_with_comma).option("es.mapping.date.rich","false").option("es.read.field.exclude", exclude_with_comma).option("es.read.field.include", "").option("pushdown", "true").load(es_index)

not passing any args except for exclude fields in which case we exclude a couple from top level.

The problem we have is missing fields in the dataframe.printSchema()...

Does it use _mapping to figure out the schema or sampling? I didn't find any docs on this.

Thanks.

james.baiera · February 11, 2019, 9:54pm

ES-Hadoop uses the mapping endpoint for the resource given, though in the 5.x line there is a bug when attempting to read from multiple indices or types: When the schema is discovered at the start of the process, only one mapping is picked up and used for the fields. I would check to make sure that you are only reading from one index and type, or if you are reading from multiple, ensure that their mappings are identical across the board

Sonny_Heer · February 11, 2019, 11:06pm

Thanks James! That helps. It appears the mapping isn't being updated when data is added by the other team - causing our issue. thanks again for confirming.

system · March 11, 2019, 11:06pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Best practise to read ES from PySpark Elasticsearch es-hadoop	5	7113	April 14, 2018
Spark sql not reading all the columns from index Elasticsearch es-hadoop	4	1016	June 1, 2020
Spark, read data from ES, how to specify fields? Elasticsearch es-hadoop	9	13793	July 6, 2017
ElasticSearch Spark Hadoop Connector Elasticsearch es-hadoop	2	1091	July 6, 2017
Issue/Error while reading data from Elastic index with Spark including Custom schema Elasticsearch es-hadoop	2	768	September 8, 2022

How ES schema is determined while reading using hadoop

Related topics