Getting the index name of documents in Spark


(Kirilyuro) #1

ES-Spark's esRDD method returns the raw document (_source, in ElasticSearch terms) and the document's id (_id in ES), but I also need additional information regarding the returned documents, such as the index name and type each document comes from.

I am querying multiple indices, i.e. my call to esRDD looks like this:
sparkContext.esRDD("index*/entities", query)
and the actual indices are "index1", "index2", etc. So, I want to know which specific index each of the documents in the resulting RDD came from.

Can this be done?

Thanks


(James Baiera) #2

Hello!

Please take a look at this section of the documentation : https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html#_metadata_when_reading_from_elasticsearch

That should get you to where you want to go.


(Kirilyuro) #3

Yeah, did the trick. Thanks!


(system) #4