Pyspark write list type to ES

(Joseph Gartner) #1

I'm attempting to write data to elasticsearch using pyspark. I can successfully write using the "saveAsNewAPIHadoopFile" api for most cases, however, when one of the json objects is a list, it throws an error:
SparkException: Data of type java.util.ArrayList cannot be used

ES can ingest lists via other means, and it is a bit cumbersome to transform my list to another state. Any advice would be useful. For refference, I am using
ElasticSearch 1.7.3
Spark 1.5.0-cdh5.5.1,
Hadoop 2.6.0-cdh5.5.1
ES-Hadoop connector: elasticsearch-hadoop-2.2.0.jar

Allow multivalued/array for all fields?
Error while indexing list to Elastic search using pyspark
(Costin Leau) #2

From the docs, Handling array/multi-values field

(system) #3