Allow multivalued/array for all fields?

I'm attempting to write a Hadoop RDD to ElasticSearch, without the luxury of seeing the fields before they are created. So far, dynamic mapping has been working well when the values for fields are singular. But, when I attempt to write fields with a list/array, I'm getting errors.

I found this post which I hoped would solve my problem, but no luck yet.

I'm also a bit new to ElasticSearch still. I'm wondering, is it possible to use the es.read.field.as.array.include and apply it to all fields? Something like:

conf={
	"es.resource":"j34/record",
	"es.nodes":"192.168.45.10:9200",
	"es.mapping.exclude":"temp_id",
	"es.mapping.id":"temp_id",
	"es.read.field.as.array.include":"*" # this setting specifically
	}

With the understanding that all fields would then be arrays? Thanks for any suggestions or advice.

As I type this, wondering if I would need to exclude fields like temp_id and _id?

The specific error I'm getting:
org.apache.spark.SparkException: Data of type java.util.ArrayList cannot be used at org.apache.spark.api.python.JavaToWritableConverter.org$apache$spark$api$python$JavaToWritableConverter$$convertToWritable(PythonHadoopUtil.scala:141) at org.apache.spark.api.python.JavaToWritableConverter$$anonfun$org$apache$spark$api$python$JavaToWritableConverter$$convertToWritable$1.apply(PythonHadoopUtil.scala:134)

My first post was short-lived, thanks to this StackOverflow post.

The solution was not to use es.read.field.as.array.include, but convert all python lists to tuples before handing over to RDD, and eventually saveAsNewAPIHadoopFile().

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.