Field not found; typically this occurs with arrays which are not mapped as single value

kraney · March 24, 2016, 7:31pm

I'm a new user, and I'm running into this error. Indeed the field it's complaining about is an array.

I'm running elasticsearch-hadoop-2.2.0-rc1

I saw this topic on the error and attempted to use the fix with no improvement. That could most definitely be some newbie mistake I am making.

I am attempting to set the es.read.field.as.array.include option as follows (in spark-shell):

val options = Map("es.read.field.as.array.include" -> "*addr_geo.*.geoname_id")
var flows = sqlContext.read.format("org.elasticsearch.spark.sql").options(options).load("flows-*/flow_log")
flows.show

But observe no change in behavior. Is there some other way I should set this option?

As a minor comment, it would be helpful if the documentation of these options included an example in scala. I can see what the option is and what value it should have, but it's not clear to me where to set it.

costin · March 24, 2016, 7:52pm

Please use the GA version (2.2.0) and then report back.

kraney · March 24, 2016, 8:15pm

I obtained the GA release and tried again.

Actually from printSchema I can see that the change is taking effect and the field is being treated as an array, but I still get the error.

I think more importantly, this is a field that may or may not be present in the document at all.

kraney · March 25, 2016, 4:11pm

It doesn't seem to be related to the field not always being present; I see the same error even if I specify a query that restricts things down to only documents that contain the field. It seems rather to be related to the document being deeply nested.

When I add fields more than one 'dot' deep to the 'es.read.field.exclude' setting, it doesn't seem to work. I can make the error go away if I exclude the entire structure, but I need access to some of the data at the same depth.

I'm sorry my description is a bit vague. I attempted to attach the Elasticsearch mapping and a sample document for this data, but it was too big to be posted here. Is there a place to post such information?

kraney · March 25, 2016, 7:28pm

Ahah! I have finally found my rookie mistake.

Not only was the field mentioned in the error an array, but also a parent of it was also an array. I was only flagging the field in es.read.field.as.array.include, not its parent.

If a.b is an array of structs, the error is reported on the first leaf-level field (like a.b.c.d) and not on a.b. Once I realized my mistake, I was able to clear all of the errors.

costin · April 5, 2016, 2:40pm

Can you expand on this? What was your mapping and what was the exception that occurred initially and what was the configuration fix?

Thanks,

kraney · April 5, 2016, 3:21pm

Sure - let's say the data looks like this:

  {
      'a': {
        'b': [
          {'c': { 'd': [1, 2, 3, 4] } },
          {'c': { 'd': [5, 6, 7, 8] } }
        ]
      }
    }

In this case, I should specify es.read.field.as.array.include = a.b,a.b.c.d

If I do not specify es.read.field.as.array, I get the error Field a.b.c.d not found. Seeing the error with field a.b.c.d, I went and specified es.read.field.as.array.include = a.b.c.d.

Having done so, there's still an error. Field a.b is an array, too, and I haven't told es-spark about it. But the error you get still says Field a.b.c.d not found. Since the error was the same as before, i thought my change was having no effect.

giaosudau · November 24, 2016, 11:24am

Hey,
I currently have the same problem even I am using 2.4.2

val options = Map("es.read.field.as.array.include" -> "*.score")

val df = sqlContext.read.options(options).format("es").load("click_detail/rawlog")
df.count()

I get
org.elasticsearch.hadoop.EsHadoopIllegalStateException: Field 'ctp.score' not found; typically this occurs with arrays which are not mapped as single value

How to fix that?
I am using Spark 2.0.2.
ES 1.7.3
Thanks.

giaosudau · November 25, 2016, 2:56am

I also tried > val options = Map("es.read.field.as.array.include" -> "ctp,cit,mit,mrm,mtp,cimit,mimit")

but it still didn't work.

Topic		Replies	Views
Spark-sql does not seem to read from a nested schema Elasticsearch es-hadoop	15	7656	July 6, 2017
Best practice elasticsearch index schema for Spark SQL Elasticsearch es-hadoop	2	1756	July 6, 2017
Field 'app.response.abc' not found; typically this occurs with arrays which are not mapped as single value Elasticsearch es-hadoop	1	471	September 28, 2020
PySpark fails to read multiple nested levels of Elasticsearch index Elasticsearch es-hadoop	1	1248	November 18, 2017
Es.read.field.as.array.include multiple values Elasticsearch es-hadoop	2	1081	June 11, 2020

Field not found; typically this occurs with arrays which are not mapped as single value

Related topics