I'm a new user, and I'm running into this error. Indeed the field it's complaining about is an array.
I'm running elasticsearch-hadoop-2.2.0-rc1
I saw this topic on the error and attempted to use the fix with no improvement. That could most definitely be some newbie mistake I am making.
I am attempting to set the es.read.field.as.array.include option as follows (in spark-shell):
val options = Map("es.read.field.as.array.include" -> "*addr_geo.*.geoname_id")
var flows = sqlContext.read.format("org.elasticsearch.spark.sql").options(options).load("flows-*/flow_log")
flows.show
But observe no change in behavior. Is there some other way I should set this option?
As a minor comment, it would be helpful if the documentation of these options included an example in scala. I can see what the option is and what value it should have, but it's not clear to me where to set it.
It doesn't seem to be related to the field not always being present; I see the same error even if I specify a query that restricts things down to only documents that contain the field. It seems rather to be related to the document being deeply nested.
When I add fields more than one 'dot' deep to the 'es.read.field.exclude' setting, it doesn't seem to work. I can make the error go away if I exclude the entire structure, but I need access to some of the data at the same depth.
I'm sorry my description is a bit vague. I attempted to attach the Elasticsearch mapping and a sample document for this data, but it was too big to be posted here. Is there a place to post such information?
Not only was the field mentioned in the error an array, but also a parent of it was also an array. I was only flagging the field in es.read.field.as.array.include, not its parent.
If a.b is an array of structs, the error is reported on the first leaf-level field (like a.b.c.d) and not on a.b. Once I realized my mistake, I was able to clear all of the errors.
In this case, I should specify es.read.field.as.array.include = a.b,a.b.c.d
If I do not specify es.read.field.as.array, I get the error Field a.b.c.d not found. Seeing the error with field a.b.c.d, I went and specified es.read.field.as.array.include = a.b.c.d.
Having done so, there's still an error. Field a.b is an array, too, and I haven't told es-spark about it. But the error you get still says Field a.b.c.d not found. Since the error was the same as before, i thought my change was having no effect.
Hey,
I currently have the same problem even I am using 2.4.2
val options = Map("es.read.field.as.array.include" -> "*.score")
val df = sqlContext.read.options(options).format("es").load("click_detail/rawlog")
df.count()
I get org.elasticsearch.hadoop.EsHadoopIllegalStateException: Field 'ctp.score' not found; typically this occurs with arrays which are not mapped as single value
How to fix that?
I am using Spark 2.0.2.
ES 1.7.3
Thanks.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.