Avoid es.mapping.id field duplication

(Rishav Rohit) #1


I am using ElasticSearch Hadoop (2.4.0) connector for storing documents in ES. I have "record_id" field in my document which is used as _id field using below configuration setting
conf.set("es.mapping.id", "record_id");
This record_id is stored twice, first in "_id" field and then in "record_id" field.
What configuration should I use to avoid duplication of this field?


(James Baiera) #2

@Rishav_Rohit1 You can also specify which fields should be ignored from the document body during serialization by using the es.mapping.exclude setting:

sc.makeRDD(Map("id" -> "1", "name" -> "Jimmy") :: Nil)
    .saveToEs("test/test", Map("es.mapping.id" -> "id", "es.mapping.exclude" -> "id"))

This tells the connector to use the "id" field as the document's id, but to ignore it when creating the body of the document. The resulting output is as follows:


(Rishav Rohit) #3

Thank you

(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.