Handling array values while reading from elasticsearch in spark using elasticsearch-spark

shubham1 · October 22, 2020, 7:27am

Hi,

I am trying to read from elasticsearch in spark using es-hadoop library. I know while reading from es-hadoop, we need to pass option es.read.field.as.array.include as in code ApacheSpark.sqlContext.read.format("org.elasticsearch.spark.sql").option("es.nodes", "xx.xxx.xxx").option("es.nodes.client.only", false).option("pushdown", true).option("es.read.field.as.array.include", "tags,fields.component,log.flags,ecs,message").load("ds2-hue_error-2020.09.22")
to handle fields with array values. But for that we need to know prior which document fields in elasticsearch index contains array and dimension of that array before calling spark read api. Is there any way of knowing which es-hadoop fields need to passed as array, otherwise it will throw error while reading.

Thanks

system · November 19, 2020, 7:27am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Best practise to read ES from PySpark Elasticsearch es-hadoop	5	7259	April 14, 2018
Elastic-Spark connector : How to read data fro ES Index which has nested Json with array fields Elasticsearch es-hadoop	2	518	July 20, 2022
Allow multivalued/array for all fields? Elasticsearch es-hadoop	2	1226	December 7, 2017
Es.read.field.as.array.include multiple values Elasticsearch es-hadoop	2	1086	June 11, 2020
Spark elasticsearch 5.0.2 scala.MatchError Elasticsearch es-hadoop	2	2310	January 9, 2017

Handling array values while reading from elasticsearch in spark using elasticsearch-spark

Related topics