Pyspark write list type to ES

Joseph_Gartner · March 9, 2016, 8:38pm

I'm attempting to write data to elasticsearch using pyspark. I can successfully write using the "saveAsNewAPIHadoopFile" api for most cases, however, when one of the json objects is a list, it throws an error:
SparkException: Data of type java.util.ArrayList cannot be used

ES can ingest lists via other means, and it is a bit cumbersome to transform my list to another state. Any advice would be useful. For refference, I am using
ElasticSearch 1.7.3
Spark 1.5.0-cdh5.5.1,
Hadoop 2.6.0-cdh5.5.1
ES-Hadoop connector: elasticsearch-hadoop-2.2.0.jar

costin · March 12, 2016, 2:16pm

From the docs, Handling array/multi-values field

Topic		Replies	Views
Error while indexing list to Elastic search using pyspark Elasticsearch es-hadoop	4	1003	February 26, 2018
How to write to ES from a pyspark dataframe? Elasticsearch es-hadoop	5	5135	July 6, 2017
Allow multivalued/array for all fields? Elasticsearch es-hadoop	2	1226	December 7, 2017
Cannot handle type while writing dataframe to ES Elasticsearch es-hadoop	3	1514	October 11, 2017
Error with pyspark connect es Elasticsearch es-hadoop	1	908	September 24, 2020

Pyspark write list type to ES

Related topics