How to retain only one record from duplicate records?

sana1 · June 11, 2019, 11:54am

I have a database with duplicate results, i need to delete all other results and retain one.
I have the query to find duplicate records
POST /data_12/_search
{
"size": 0,
"aggs": {
"duplicateCount": {
"terms": {
"field": "col2.keyword",
"min_doc_count": 2,
"size": 10
},
"aggs": {
"duplicateDocuments": {
"top_hits": {
"size": 100
}
}
}
}
}
}

Please help me with the delete query

Mark_Harwood · June 11, 2019, 12:28pm

Could you make col2 your choice of ID for the elasticsearch document to avoid this situation in the first place?

sana1 · June 12, 2019, 4:47am

I donot get your point @Mark_Harwood

Mark_Harwood · June 12, 2019, 6:40am

Elasticsearch IDs are the equivalent of a primary key - it ensures there’s only one doc with that ID. If you don’t supply an ID (and many don’t) elasticsearch invents a new unique one. If you do supply an ID it checks to see if this already exists

sana1 · June 12, 2019, 6:43am

Please tell me procedure or code? Should I change in the ES template or should I change in the query?
If in the query then how to do it?

Mark_Harwood · June 12, 2019, 7:13am

It’s not the template or the query. It’s the client that does the write - as well as giving the JSON body you can also pass an ID in our write APIs

system · July 10, 2019, 7:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Effective Way to Remove Existing Duplicate Documents in ElasticSearch Elasticsearch	12	3966	January 14, 2021
Find and delete duplicate documents Elasticsearch	8	25895	July 27, 2018
Update existing record in elasticsearch Elasticsearch	3	4354	June 16, 2017
Identify and delete duplicates on several indexes Elasticsearch	1	1935	January 9, 2018
Deduplication of records with deletion code Elasticsearch	1	1280	April 3, 2018

How to retain only one record from duplicate records?

Related topics