Delete all docs that have duplicate field values

alfianaf · February 9, 2022, 9:59am

Hello, I ingest some data into my cluster, and turned out some of my data has been ingested twice, so some of the data is doubled.
My index has unique field, because some of it is doubled, I'm trying to search for duplicate field by using the terms aggregation, here's my command

GET <index>/_search
{
"size": 10000,
    "aggs": {
        "duplicateNames": {
            "terms": {
                "field": "EmployeeName",
                "min_doc_count": 2
            }
        }
    }
}

from that command, I can find the duplicate value. But is there any way to delete just the "duplicate" value and skip the original one?
tldr can I delete one from two same docs value?

Any help is appreciated, Thanks

alfianaf · February 9, 2022, 6:22pm

anyone got any clue about this?

ylasri · February 9, 2022, 6:49pm

Have a look to this post

alfianaf · February 10, 2022, 1:47am

hi, thanks for the answer,
I will look into this asap, will let you know after I try this method

alfianaf · February 10, 2022, 10:48am

thanks for the help, finally I can do the thing, appreciate it

system · March 10, 2022, 10:49am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to remove duplicate values? Logstash	1	464	December 25, 2019
Duplicate Deletion in Elasticsearch 2.X Elasticsearch	2	569	July 25, 2017
Should doc['field_name'] remove duplicates? Elasticsearch	1	423	November 1, 2019
Effective Way to Remove Existing Duplicate Documents in ElasticSearch Elasticsearch	12	3966	January 14, 2021
duplicateNames with multiple fields? Elasticsearch	8	5173	March 21, 2018

Delete all docs that have duplicate field values

Related topics