How to retain only one record from duplicate records?

I have a database with duplicate results, i need to delete all other results and retain one.
I have the query to find duplicate records
POST /data_12/_search
{
"size": 0,
"aggs": {
"duplicateCount": {
"terms": {
"field": "col2.keyword",
"min_doc_count": 2,
"size": 10
},
"aggs": {
"duplicateDocuments": {
"top_hits": {
"size": 100
}
}
}
}
}
}

Please help me with the delete query

Could you make col2 your choice of ID for the elasticsearch document to avoid this situation in the first place?

I donot get your point @Mark_Harwood

Elasticsearch IDs are the equivalent of a primary key - it ensures there’s only one doc with that ID. If you don’t supply an ID (and many don’t) elasticsearch invents a new unique one. If you do supply an ID it checks to see if this already exists

Please tell me procedure or code? Should I change in the ES template or should I change in the query?
If in the query then how to do it?

It’s not the template or the query. It’s the client that does the write - as well as giving the JSON body you can also pass an ID in our write APIs

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.