NullPointerException when settings "es.read.field.as.array.include" options

With Spark + ES, when I set:

conf.set("es.read.field.as.array.include", "api_search_ads_print_data.ads");

I got the following exception:

java.lang.NullPointerException
at java.lang.String.startsWith(String.java:1405)
at java.lang.String.startsWith(String.java:1434)
at org.elasticsearch.hadoop.serialization.field.FieldFilter.filter(FieldFilter.java:105)
at org.elasticsearch.hadoop.serialization.field.FieldFilter.filter(FieldFilter.java:132)
at org.elasticsearch.hadoop.serialization.builder.JdkValueReader.addToArray(JdkValueReader.java:121)

Is it clear enough or do you need more details?

Adding the version of the connector you are working with and the versions of the integrating technologies (ES, Spark, Hadoop, etc...) helps a great deal in troubleshooting these problems.

It's also normally helpful to include a small number of test records/mappings that can allow us to reproduce this faster.

I am using:

  • Spark 1.6.2
  • elasticsearch 2.4
  • elasticsearch-hadoop: 2.4

I will try to setup a small sample tomorrow.

Thanks,

Hello, quite easy to reproduce, a single document:

"a": {
                  "b": [
                     {
                        "c": "hello"
                     }
                  ]
               }

Settings conf.set("es.read.field.as.array.include", "a.b"); causes the NullPointerException. (I don't see this error when I use DataFrame, only RDD).

Actually, at https://github.com/elastic/elasticsearch-hadoop/blob/2.4/mr/src/main/java/org/elasticsearch/hadoop/serialization/builder/JdkValueReader.java#L121 currentFieldName is null.

@ebuildy were you able to resolve this? I am seeing the exact same error while trying to do an ETL from one index to another.

Hello,

Nop, there are plenty issues with array (latest I found => Bug when reading if a field has no mapping (empty array by ex.) drive me crazy ^^).

So, I removed array for flat structure, no problem like that.

(having fields like toto_1:, toto_2: , toto_3 instead of toto: [])