How to reindex distinct data in elasticsearch

gayathri · September 7, 2016, 6:16am

Hi Team,

I am able to reindex the data in elasticsearch using:

curl -XPOST http://localhost:9200/_reindex?pretty -d'
{

"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
},
"script": {
"inline": "ctx._source.field_new = ctx._source.remove("field")"
}
}'

I have many duplicate log entries in my index..I want to reindex by removing duplicate entries.
Could u please suggest any method?

Mark_Harwood · September 8, 2016, 10:40am

The scripts that I use for entity-centric indexing [1] sort content in a source index by a common key and consolidate multiple docs into a update on a single document in the target index. The "pull" from the source index and the "push" to the target index are both done using the respective bulk APIs.
Your use case is slightly different in that you want to insert a single doc in the target index rather than update one but you should be able to adapt the included python script with few changes.

[1] http://bit.ly/entcent

kstaken · September 8, 2016, 9:40pm

Just a suggestion, come up with a way to create a key from the data in the log entries and then use create requests to save the data into a new index under that key. If there are duplicate entries then only the first will be indexed and the duplicates will be dropped. Unfortunately, I don't know if the reindex api supports this.

Kimbro

gayathri · September 13, 2016, 6:44am

Is it possible to remove the duplicate log entries from an existing index ?

kstaken · September 14, 2016, 3:52pm

There's no easy way to do that. You'd still have to walk the entire index and figure out what are duplicates then delete them. It will be a lot simpler to simply reindex and drop the duplicates in that process.

Topic		Replies	Views
How can I delete all the duplicate records except one to keep my data? Elasticsearch	13	2328	March 9, 2020
Duplicate results in resultset Elasticsearch	4	3038	July 6, 2017
Identify and delete duplicates on several indexes Elasticsearch	1	1958	January 9, 2018
Delete duplicate items Elasticsearch	1	333	July 6, 2017
Remove redundancy of data Elasticsearch	5	721	July 5, 2017

How to reindex distinct data in elasticsearch

Related topics