Spark sql to query array fields in elastic search

tridib · September 11, 2015, 1:39pm

I have integrated elasticsearch with spark using elastic-spark connector. I can query elastic search through spark.

But the query does not return any result when I put array fields in where clause.
Elastic search document looks like bellow.
{

"_index": "ppm",
"_type": "docs",
"_id": "AU-5lX7CCHEkhOBzqUsa",
"_version": 1,
"_score": 1,
"_source": {
    "member_id": 1,
     "year": "2015",
    "r": [
        "R1"
        ,
        "R3"
    ]
}

}
SQL used: select member_id from table_xxx where r = 'R1'
No result returned.

I tried hive function array_contains(r, 'R1'), but it says expected array found string. looks like elastic stores array as string type.
Is there a different way to query the array fields?
How can I see the elastic search native json query corresponding to SQL?

Thanks
-Tridib

eliasah · September 20, 2015, 10:00am

Can you elaborate on how you are querying Elasticsearch through Spark?

tridib · September 23, 2015, 5:44am

I register the ES index "ppm" as "table_xxx ". Then I query "table_xxx". Standard spark sql. This setup works for other simple condion in where clause. But did not work for array type.

eliasah · September 24, 2015, 3:58pm

You know, once the index is in Spark, it's out of the Elasticsearch scope query. To query the table you have created, you should considering using functions from the Spark SQL API.

tridib · September 25, 2015, 4:45pm

As I mentioned in my original post that spark sql query "array_contains(r, 'R1')" did not work with elastic search. According to elastic/hadoop connector this should work.

eliasah · September 25, 2015, 4:58pm

Can you share the reference (documentation) where is says that this actions should work?

costin · September 27, 2015, 10:50am

This is a known issue stemming from the fact that ES doesn't treat arrays differently than single values. Which means when reading a mapping, one doesn't know whether a certain field is an array or not until after reading all the values (as the array can appear at any point in time).
Typically this is not an issue however since Spark SQL requires the schema to be known before hand and be fixed, when the underlying format changes, this causes issues.
There's an issue raised to address this which should hit master soon.

tridib · September 27, 2015, 6:18pm

From elastic hadoop docs and costin's presentation I got the message that all Spark SQL will work.

tridib · September 27, 2015, 6:19pm

Thanks for you response. Will wait for the fix.

Topic		Replies	Views
Handling array values while reading from elasticsearch in spark using elasticsearch-spark Elasticsearch es-hadoop	1	930	November 19, 2020
Best practise to read ES from PySpark Elasticsearch es-hadoop	5	7118	April 14, 2018
Field not found; typically this occurs with arrays which are not mapped as single value Elasticsearch es-hadoop	9	6199	July 6, 2017
Best practice elasticsearch index schema for Spark SQL Elasticsearch es-hadoop	2	1755	July 6, 2017
Query filter not working with SparkSql Elasticsearch es-hadoop	7	1570	March 2, 2017

Spark sql to query array fields in elastic search

Related topics