I am using ElasticSearch Hadoop (2.4.0) connector for storing documents in ES. I have "record_id" field in my document which is used as _id field using below configuration setting
conf.set("es.mapping.id", "record_id");
This record_id is stored twice, first in "_id" field and then in "record_id" field.
What configuration should I use to avoid duplication of this field?
@Rishav_Rohit1 You can also specify which fields should be ignored from the document body during serialization by using the es.mapping.exclude setting:
This tells the connector to use the "id" field as the document's id, but to ignore it when creating the body of the document. The resulting output is as follows:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.