Allow multivalued/array for all fields?

ghukill · November 9, 2017, 3:47pm

I'm attempting to write a Hadoop RDD to ElasticSearch, without the luxury of seeing the fields before they are created. So far, dynamic mapping has been working well when the values for fields are singular. But, when I attempt to write fields with a list/array, I'm getting errors.

I found this post which I hoped would solve my problem, but no luck yet.

I'm also a bit new to ElasticSearch still. I'm wondering, is it possible to use the es.read.field.as.array.include and apply it to all fields? Something like:

conf={
	"es.resource":"j34/record",
	"es.nodes":"192.168.45.10:9200",
	"es.mapping.exclude":"temp_id",
	"es.mapping.id":"temp_id",
	"es.read.field.as.array.include":"*" # this setting specifically
	}

With the understanding that all fields would then be arrays? Thanks for any suggestions or advice.

As I type this, wondering if I would need to exclude fields like temp_id and _id?

The specific error I'm getting:
org.apache.spark.SparkException: Data of type java.util.ArrayList cannot be used at org.apache.spark.api.python.JavaToWritableConverter.org$apache$spark$api$python$JavaToWritableConverter$$convertToWritable(PythonHadoopUtil.scala:141) at org.apache.spark.api.python.JavaToWritableConverter$$anonfun$org$apache$spark$api$python$JavaToWritableConverter$$convertToWritable$1.apply(PythonHadoopUtil.scala:134)

ghukill · November 9, 2017, 4:06pm

My first post was short-lived, thanks to this StackOverflow post.

The solution was not to use es.read.field.as.array.include, but convert all python lists to tuples before handing over to RDD, and eventually saveAsNewAPIHadoopFile().

system · December 7, 2017, 4:07pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Handling array values while reading from elasticsearch in spark using elasticsearch-spark Elasticsearch es-hadoop	1	934	November 19, 2020
Es.read.field.as.array.include multiple values Elasticsearch es-hadoop	2	1081	June 11, 2020
Pyspark write list type to ES Elasticsearch es-hadoop	2	1957	July 6, 2017
Unclear usage of es.read.field.as.array.exclude Elasticsearch es-hadoop	3	3102	January 11, 2019
ElasticSearch+Hadoop+Spark Elasticsearch	2	964	July 6, 2017

Allow multivalued/array for all fields?

Related topics