Delete all docs that have duplicate field values

Hello, I ingest some data into my cluster, and turned out some of my data has been ingested twice, so some of the data is doubled.
My index has unique field, because some of it is doubled, I'm trying to search for duplicate field by using the terms aggregation, here's my command

GET <index>/_search
{
"size": 10000,
    "aggs": {
        "duplicateNames": {
            "terms": {
                "field": "EmployeeName",
                "min_doc_count": 2
            }
        }
    }
}

from that command, I can find the duplicate value. But is there any way to delete just the "duplicate" value and skip the original one?
tldr can I delete one from two same docs value?

Any help is appreciated, Thanks

anyone got any clue about this?

Have a look to this post

1 Like

hi, thanks for the answer,
I will look into this asap, will let you know after I try this method

thanks for the help, finally I can do the thing, appreciate it

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.