Hi,
I am using ElasticSearch Hadoop (2.4.0) connector for storing documents in ES. I have "record_id" field in my document which is used as _id field using below configuration setting
conf.set("es.mapping.id", "record_id");
This record_id is stored twice, first in "_id" field and then in "record_id" field.
What configuration should I use to avoid duplication of this field?
Thanks.
@Rishav_Rohit1 You can also specify which fields should be ignored from the document body during serialization by using the es.mapping.exclude setting:
sc.makeRDD(Map("id" -> "1", "name" -> "Jimmy") :: Nil)
.saveToEs("test/test", Map("es.mapping.id" -> "id", "es.mapping.exclude" -> "id"))
This tells the connector to use the "id" field as the document's id, but to ignore it when creating the body of the document. The resulting output is as follows:
{
"took":39,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":1.0,
"hits":[
{
"_index":"test",
"_type":"test",
"_id":"1",
"_score":1.0,
"_source":{"name":"Jimmy"}
}
]
}
}