ES-Spark : issue with 'read_field_include' in case of nested objects

Preeti_Raj_Buchhada · April 19, 2016, 11:18am

A typical record in my ES index looks like:

"_source": {
      "app": "panoply",
      "response": {
         "category": "uncategorized",
         "subcategory": "uncategorized",
         "activity_common_name": "name123",
         "score": 0,
         "duration_secs": 2,
         "sub_activity": null,
         "activity": "name123"
      },
      "member_id": 2357919,
      "device_user_identity": 1688734,
      "activity_type": "type123",
      "response_timestamp": "2016-01-10T23:05:18.000Z"
   }

When I created a TABLE using Spark Shell as follows:

sql("""
      CREATE TEMPORARY TABLE jan10
      USING org.elasticsearch.spark.sql
      OPTIONS (
        resource 'cortez/data',
        nodes 'localhost',
        port '9201',
        scroll_size '500',
        query '?response_timestamp:[2016-01-01 TO 2016-01-10]',
        read_field_include 'member_id,response.category,response.subcategory,response.activity,response.activity_common_name,response.duration_secs,response.sub_activity,response_timestamp'
      ) """)

**Note:** response.score is not included in 'read_field_include'

and executed

sql("""SELECT * from jan10""").show()

I observed that all field values after response.score (namely duration_secs, sub_activity and activity) are showing up as null.
If I add response.score to 'read_field_include', all filed values are fetched correctly.

Seems like a bug.
Can you please check.
Thanks.

costin · April 21, 2016, 6:06am

It looks like a bug that probably triggers skipping of nested fields.
What version of ES-Hadoop are you using?

Cheers,

Preeti_Raj_Buchhada · April 21, 2016, 6:40am

Environment:
ES: 1.3.2
es-hadoop: elasticsearch-hadoop-2.3.0
Spark: spark-1.6.1-bin-hadoop2.6

Topic		Replies	Views
Read.field.include with nested field doesnot really work Elasticsearch es-hadoop	2	964	October 2, 2018
Field not found; typically this occurs with arrays which are not mapped as single value Elasticsearch es-hadoop	9	6394	July 6, 2017
Spark-sql does not seem to read from a nested schema Elasticsearch es-hadoop	15	7711	July 6, 2017
Best practice elasticsearch index schema for Spark SQL Elasticsearch es-hadoop	2	1779	July 6, 2017
Pyspark - read nested Object field from elasticsearch Elasticsearch es-hadoop	1	1003	June 30, 2020

ES-Spark : issue with 'read_field_include' in case of nested objects

Related topics