Avoid es.mapping.id field duplication

Rishav_Rohit1 · November 22, 2016, 12:04pm

Hi,

I am using ElasticSearch Hadoop (2.4.0) connector for storing documents in ES. I have "record_id" field in my document which is used as _id field using below configuration setting
conf.set("es.mapping.id", "record_id");
This record_id is stored twice, first in "_id" field and then in "record_id" field.
What configuration should I use to avoid duplication of this field?

Thanks.

james.baiera · November 28, 2016, 4:32pm

@Rishav_Rohit1 You can also specify which fields should be ignored from the document body during serialization by using the es.mapping.exclude setting:

sc.makeRDD(Map("id" -> "1", "name" -> "Jimmy") :: Nil)
    .saveToEs("test/test", Map("es.mapping.id" -> "id", "es.mapping.exclude" -> "id"))

This tells the connector to use the "id" field as the document's id, but to ignore it when creating the body of the document. The resulting output is as follows:

{
    "took":39,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "failed":0
    },
    "hits":{
        "total":1,
        "max_score":1.0,
        "hits":[
            {
                "_index":"test",
                "_type":"test",
                "_id":"1",
                "_score":1.0,
                "_source":{"name":"Jimmy"}
            }
        ]
    }
}

Rishav_Rohit1 · November 30, 2016, 10:00am

Thank you

system · December 28, 2016, 10:01am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Es.mapping.id field duplication not allowing Elasticsearch es-hadoop	2	906	June 6, 2017
Use saveJsonToEs and always keep the same _id field Elasticsearch es-hadoop	3	1244	July 6, 2017
[elasticsearch-hadoop] How to specify es.mapping.id value from inside a map? Elasticsearch es-hadoop	2	2362	January 17, 2018
Multiple Field as mapping iD Elasticsearch es-hadoop	2	1340	July 19, 2018
Data duplicated in Elasticsearch when added from Hive - RESOLVED Elasticsearch es-hadoop	3	1140	August 23, 2018

Avoid es.mapping.id field duplication

Related topics