Delete duplicate items

Jingzhao_Ou · September 29, 2014, 1:07am

Hi, all,

I use Elastic Search to store some JSON data like the following:

{
"_index" : "normalized",
"_type" : "90A2DAFB0621",
"_id" : "Fri Sep 12 16:59:50 UTC 2014",
"_score" : 1.0,
"_source":{"id":"2014-09-12T16:59:50.000Z","r":72.16,"o":74.3,"m":78.01,"s":66.99,"c":0.03,"p":2.77,"e":7.694444444444444E-6,"ec":1.8466666666666666E-6,"mo":0,"ot":64.31,"ecop":91}
}

I changed how "_id" is calculated in my program later on. Then, in the old data sets, there are two duplicated items for older data. I was able to find the duplicated items using the aggregation API:

{
"aggs": {
"types": {
"terms": {
"field": "_type"
},
"aggs": {
"dups": {
"histogram": {
"field": "id",
"interval": 1,
"min_doc_count": 2
}
}
}
}
}
}

I can remove the old data one by one using the delete API. But I wonder if there are any better solutions.

Thanks a lot for your help!

Jingzhao

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2d84de9f-5317-45ff-b599-2cae7f505b3a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Duplicate Deletion in Elasticsearch 2.X Elasticsearch	2	569	July 25, 2017
Find and delete duplicate documents Elasticsearch	8	25895	July 27, 2018
How to identify and remove duplicates in Elasticsearch index Elasticsearch	3	276	July 20, 2022
How to remove duplicate search result in elasticsearch? Elasticsearch	5	1272	November 17, 2020
How to identiry duplicates and delete it in index Elasticsearch	7	387	July 21, 2022

Delete duplicate items

Related topics