Getting the index name of documents in Spark

(Kirilyuro) #1

ES-Spark's esRDD method returns the raw document (_source, in ElasticSearch terms) and the document's id (_id in ES), but I also need additional information regarding the returned documents, such as the index name and type each document comes from.

I am querying multiple indices, i.e. my call to esRDD looks like this:
sparkContext.esRDD("index*/entities", query)
and the actual indices are "index1", "index2", etc. So, I want to know which specific index each of the documents in the resulting RDD came from.

Can this be done?


(James Baiera) #2


Please take a look at this section of the documentation :

That should get you to where you want to go.

(Kirilyuro) #3

Yeah, did the trick. Thanks!

(system) #4