I'm attempting to write a Hadoop RDD to ElasticSearch, without the luxury of seeing the fields before they are created. So far, dynamic mapping has been working well when the values for fields are singular. But, when I attempt to write fields with a list/array, I'm getting errors.
I found this post which I hoped would solve my problem, but no luck yet.
I'm also a bit new to ElasticSearch still. I'm wondering, is it possible to use the es.read.field.as.array.include
and apply it to all fields? Something like:
conf={
"es.resource":"j34/record",
"es.nodes":"192.168.45.10:9200",
"es.mapping.exclude":"temp_id",
"es.mapping.id":"temp_id",
"es.read.field.as.array.include":"*" # this setting specifically
}
With the understanding that all fields would then be arrays? Thanks for any suggestions or advice.
As I type this, wondering if I would need to exclude fields like temp_id
and _id
?
The specific error I'm getting:
org.apache.spark.SparkException: Data of type java.util.ArrayList cannot be used at org.apache.spark.api.python.JavaToWritableConverter.org$apache$spark$api$python$JavaToWritableConverter$$convertToWritable(PythonHadoopUtil.scala:141) at org.apache.spark.api.python.JavaToWritableConverter$$anonfun$org$apache$spark$api$python$JavaToWritableConverter$$convertToWritable$1.apply(PythonHadoopUtil.scala:134)