Field not found; typically this occurs with arrays which are not mapped as single value

I'm a new user, and I'm running into this error. Indeed the field it's complaining about is an array.

I'm running elasticsearch-hadoop-2.2.0-rc1

I saw this topic on the error and attempted to use the fix with no improvement. That could most definitely be some newbie mistake I am making.

I am attempting to set the es.read.field.as.array.include option as follows (in spark-shell):

val options = Map("es.read.field.as.array.include" -> "*addr_geo.*.geoname_id")
var flows = sqlContext.read.format("org.elasticsearch.spark.sql").options(options).load("flows-*/flow_log")
flows.show

But observe no change in behavior. Is there some other way I should set this option?

As a minor comment, it would be helpful if the documentation of these options included an example in scala. I can see what the option is and what value it should have, but it's not clear to me where to set it.

Please use the GA version (2.2.0) and then report back.

I obtained the GA release and tried again.

Actually from printSchema I can see that the change is taking effect and the field is being treated as an array, but I still get the error.

I think more importantly, this is a field that may or may not be present in the document at all.

It doesn't seem to be related to the field not always being present; I see the same error even if I specify a query that restricts things down to only documents that contain the field. It seems rather to be related to the document being deeply nested.

When I add fields more than one 'dot' deep to the 'es.read.field.exclude' setting, it doesn't seem to work. I can make the error go away if I exclude the entire structure, but I need access to some of the data at the same depth.

I'm sorry my description is a bit vague. I attempted to attach the Elasticsearch mapping and a sample document for this data, but it was too big to be posted here. Is there a place to post such information?

Ahah! I have finally found my rookie mistake.

Not only was the field mentioned in the error an array, but also a parent of it was also an array. I was only flagging the field in es.read.field.as.array.include, not its parent.

If a.b is an array of structs, the error is reported on the first leaf-level field (like a.b.c.d) and not on a.b. Once I realized my mistake, I was able to clear all of the errors.

Can you expand on this? What was your mapping and what was the exception that occurred initially and what was the configuration fix?

Thanks,

Sure - let's say the data looks like this:

  {
      'a': {
        'b': [
          {'c': { 'd': [1, 2, 3, 4] } },
          {'c': { 'd': [5, 6, 7, 8] } }
        ]
      }
    }

In this case, I should specify es.read.field.as.array.include = a.b,a.b.c.d

If I do not specify es.read.field.as.array, I get the error Field a.b.c.d not found. Seeing the error with field a.b.c.d, I went and specified es.read.field.as.array.include = a.b.c.d.

Having done so, there's still an error. Field a.b is an array, too, and I haven't told es-spark about it. But the error you get still says Field a.b.c.d not found. Since the error was the same as before, i thought my change was having no effect.

Hey,
I currently have the same problem even I am using 2.4.2

val options = Map("es.read.field.as.array.include" -> "*.score")

val df = sqlContext.read.options(options).format("es").load("click_detail/rawlog")
df.count()

I get
org.elasticsearch.hadoop.EsHadoopIllegalStateException: Field 'ctp.score' not found; typically this occurs with arrays which are not mapped as single value

How to fix that?
I am using Spark 2.0.2.
ES 1.7.3
Thanks.

my schema like this
|-- mit: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: integer (nullable = true)
| | |-- score: double (nullable = true)
|-- mrm: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: long (nullable = true)
| | |-- score: double (nullable = true)
|-- mtp: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: integer (nullable = true)

I also tried > val options = Map("es.read.field.as.array.include" -> "ctp,cit,mit,mrm,mtp,cimit,mimit")

but it still didn't work.