Retrieve stored fields from elastic search via elasticsearch hadoop connector

edheck · October 18, 2017, 2:06pm

Here is our situation. Following is our document structure :-

{
   'key1': 'some_val',
   'key2': 'some_very_large_binary_value',
   ...
}

Because value of key2 is a large binary value and because we don't need it to be accessible in kibana and other various analytical jobs, we are excluding it from '_src' and making it a stored field, there is only one type of analysis where we need that, when we provide the following query to get it :-

GET /_search

{
    "stored_fields" : ["key2"],
    "query" : {
        "term" : { "key1" : "some_value" }
    }
}

This works fine until we are not using spark. With spark, the value of 'key2' comes out as null in each document.

We use spark for large scale analysis and thus the elasticsearch hadoop connector. We saw the query which is eventually generated by the connector. It looks like :-

POST /_search?sort=_doc&scroll=5m&size=50&_source=key1,key2

Why is the connector putting 'key2' in '_source' ? That's why we get null in each document retrieved.

Is there some configuration we are missing ?

system · November 15, 2017, 2:06pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ElasticSearch+Hadoop+Spark Elasticsearch	2	976	July 6, 2017
[Hadoop][Spark] Exclude metadata fields from _source Elasticsearch	5	1725	July 6, 2017
Elasticsearch.spark.sql queries return null Elasticsearch es-hadoop	1	733	June 3, 2019
Value got nulled when ingesting to ES from Hadoop using Spark Elasticsearch es-hadoop	1	404	January 6, 2021
All data from Elasticspark sql queries returns as null Elasticsearch es-hadoop	3	827	April 30, 2019

Retrieve stored fields from elastic search via elasticsearch hadoop connector

Related topics