Field not found; typically this occurs with arrays which are not mapped as single value

(Kraney) #1

I'm a new user, and I'm running into this error. Indeed the field it's complaining about is an array.

I'm running elasticsearch-hadoop-2.2.0-rc1

I saw this topic on the error and attempted to use the fix with no improvement. That could most definitely be some newbie mistake I am making.

I am attempting to set the option as follows (in spark-shell):

val options = Map("" -> "*addr_geo.*.geoname_id")
var flows ="org.elasticsearch.spark.sql").options(options).load("flows-*/flow_log")

But observe no change in behavior. Is there some other way I should set this option?

As a minor comment, it would be helpful if the documentation of these options included an example in scala. I can see what the option is and what value it should have, but it's not clear to me where to set it.

(Costin Leau) #2

Please use the GA version (2.2.0) and then report back.

(Kraney) #3

I obtained the GA release and tried again.

Actually from printSchema I can see that the change is taking effect and the field is being treated as an array, but I still get the error.

I think more importantly, this is a field that may or may not be present in the document at all.

(Kraney) #4

It doesn't seem to be related to the field not always being present; I see the same error even if I specify a query that restricts things down to only documents that contain the field. It seems rather to be related to the document being deeply nested.

When I add fields more than one 'dot' deep to the '' setting, it doesn't seem to work. I can make the error go away if I exclude the entire structure, but I need access to some of the data at the same depth.

I'm sorry my description is a bit vague. I attempted to attach the Elasticsearch mapping and a sample document for this data, but it was too big to be posted here. Is there a place to post such information?

(Kraney) #5

Ahah! I have finally found my rookie mistake.

Not only was the field mentioned in the error an array, but also a parent of it was also an array. I was only flagging the field in, not its parent.

If a.b is an array of structs, the error is reported on the first leaf-level field (like a.b.c.d) and not on a.b. Once I realized my mistake, I was able to clear all of the errors.

(Costin Leau) #6

Can you expand on this? What was your mapping and what was the exception that occurred initially and what was the configuration fix?


(Kraney) #7

Sure - let's say the data looks like this:

      'a': {
        'b': [
          {'c': { 'd': [1, 2, 3, 4] } },
          {'c': { 'd': [5, 6, 7, 8] } }

In this case, I should specify = a.b,a.b.c.d

If I do not specify, I get the error Field a.b.c.d not found. Seeing the error with field a.b.c.d, I went and specified = a.b.c.d.

Having done so, there's still an error. Field a.b is an array, too, and I haven't told es-spark about it. But the error you get still says Field a.b.c.d not found. Since the error was the same as before, i thought my change was having no effect.

(Chanh Le) #8

I currently have the same problem even I am using 2.4.2

val options = Map("" -> "*.score")

val df ="es").load("click_detail/rawlog")

I get
org.elasticsearch.hadoop.EsHadoopIllegalStateException: Field 'ctp.score' not found; typically this occurs with arrays which are not mapped as single value

How to fix that?
I am using Spark 2.0.2.
ES 1.7.3

(Chanh Le) #9

my schema like this
|-- mit: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: integer (nullable = true)
| | |-- score: double (nullable = true)
|-- mrm: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: long (nullable = true)
| | |-- score: double (nullable = true)
|-- mtp: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: integer (nullable = true)

I also tried > val options = Map("" -> "ctp,cit,mit,mrm,mtp,cimit,mimit")

but it still didn't work.

(system) #10