Spark code to get select firelds from ES

kedarsdixit · September 20, 2017, 2:00pm

Hi,

I want to get only select fields from ES using Spark ES connector.

I have done some code which is fetching all the documents matching given index as below:

JavaPairRDD<String, Map<String, Object>> esRDD = JavaEsSpark.esRDD(jsc, searchIndex);

However, is there a way to only get specific fields from documents for every index in ES than getting everything ?

Example: Let's say, I have many fields in the documents as below and I have @timestamp which is also a field in the response { .............., @timestamp=Fri Jul 07 01:36:00 IST 2017, ..............}, Here how can I get the only field @timestamp for all my indexes ?

I could see something here but unable to correlate. can someone help me please ?

Many Thanks!
~KD

james.baiera · October 4, 2017, 3:38am

@kedarsdixit If you are using Spark SQL - We provide a native integration with Spark SQL that allows you to push down predicate filters and field projections directly to Elasticsearch (i.e. if you SELECT timestamp FROM ... then the connector will recognize that this field is the only one needed and will only return the timestamp field to the executors processing the data.)

Alternatively, if you are using vanilla Spark RDDs that do not support query planning and schema optimizations like Spark SQL does, we provide a configuration that you can set with the names of the fields you would like to return from the cluster (see es.read.source.filter in the docs.

Hopefully that helps!

kedarsdixit · October 4, 2017, 4:13pm

thanks @james.baiera well I am using the SparkEs connector and I could figure out the way to select the specific fields. Many Thanks! ~Kedar

system · November 1, 2017, 4:13pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Spark, read data from ES, how to specify fields? Elasticsearch es-hadoop	9	13880	July 6, 2017
Query filter not working with SparkSql Elasticsearch es-hadoop	7	1570	March 2, 2017
Elasticsearch - Spark Retrieve only specific fields and not the whole document Elasticsearch	1	426	February 7, 2018
Need some help on to query ES using java Transport client Elasticsearch	13	2055	October 25, 2017
PySpark - How to read timestamp date_nanos from ElasticSearch to Spark? Elasticsearch es-hadoop	2	712	December 8, 2021

Spark code to get select firelds from ES

Related topics