Avoid es.mapping.id field duplication

Hi,

I am using ElasticSearch Hadoop (2.4.0) connector for storing documents in ES. I have "record_id" field in my document which is used as _id field using below configuration setting
conf.set("es.mapping.id", "record_id");
This record_id is stored twice, first in "_id" field and then in "record_id" field.
What configuration should I use to avoid duplication of this field?

Thanks.

@Rishav_Rohit1 You can also specify which fields should be ignored from the document body during serialization by using the es.mapping.exclude setting:

sc.makeRDD(Map("id" -> "1", "name" -> "Jimmy") :: Nil)
    .saveToEs("test/test", Map("es.mapping.id" -> "id", "es.mapping.exclude" -> "id"))

This tells the connector to use the "id" field as the document's id, but to ignore it when creating the body of the document. The resulting output is as follows:

{
    "took":39,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "failed":0
    },
    "hits":{
        "total":1,
        "max_score":1.0,
        "hits":[
            {
                "_index":"test",
                "_type":"test",
                "_id":"1",
                "_score":1.0,
                "_source":{"name":"Jimmy"}
            }
        ]
    }
}

Thank you

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.