Getting the index name of documents in Spark

kirilyuro · June 13, 2016, 9:05am

ES-Spark's esRDD method returns the raw document (_source, in ElasticSearch terms) and the document's id (_id in ES), but I also need additional information regarding the returned documents, such as the index name and type each document comes from.

I am querying multiple indices, i.e. my call to esRDD looks like this:
sparkContext.esRDD("index*/entities", query)
and the actual indices are "index1", "index2", etc. So, I want to know which specific index each of the documents in the resulting RDD came from.

Can this be done?

Thanks

james.baiera · June 13, 2016, 2:45pm

Hello!

Please take a look at this section of the documentation : https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html#_metadata_when_reading_from_elasticsearch

That should get you to where you want to go.

kirilyuro · June 14, 2016, 4:07pm

Yeah, did the trick. Thanks!

Topic		Replies	Views
Spark - How to find the Index ID for a search request Elasticsearch es-hadoop	4	1034	July 6, 2017
Read ES Index from Spark Executors Elasticsearch es-hadoop	4	2567	July 6, 2017
Spark code to get select firelds from ES Elasticsearch es-hadoop	3	1924	November 1, 2017
Duplicate in Dataset while reading from elasticsearch index with SPARK Elasticsearch es-hadoop	1	697	May 9, 2019
Spark, read data from ES, how to specify fields? Elasticsearch es-hadoop	9	13812	July 6, 2017

Getting the index name of documents in Spark

Related topics