Spark code to get select firelds from ES


(Kedar Dixit) #1

Hi,

I want to get only select fields from ES using Spark ES connector.

I have done some code which is fetching all the documents matching given index as below:

JavaPairRDD<String, Map<String, Object>> esRDD = JavaEsSpark.esRDD(jsc, searchIndex);

However, is there a way to only get specific fields from documents for every index in ES than getting everything ?

Example: Let's say, I have many fields in the documents as below and I have @timestamp which is also a field in the response { .............., @timestamp=Fri Jul 07 01:36:00 IST 2017, ..............}, Here how can I get the only field @timestamp for all my indexes ?

I could see something here but unable to correlate. can someone help me please ?

Many Thanks!
~KD


(James Baiera) #2

@kedarsdixit If you are using Spark SQL - We provide a native integration with Spark SQL that allows you to push down predicate filters and field projections directly to Elasticsearch (i.e. if you SELECT timestamp FROM ... then the connector will recognize that this field is the only one needed and will only return the timestamp field to the executors processing the data.)

Alternatively, if you are using vanilla Spark RDDs that do not support query planning and schema optimizations like Spark SQL does, we provide a configuration that you can set with the names of the fields you would like to return from the cluster (see es.read.source.filter in the docs.

Hopefully that helps!


(Kedar Dixit) #3

thanks @james.baiera well I am using the SparkEs connector and I could figure out the way to select the specific fields. Many Thanks! ~Kedar


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.