I have a database with duplicate results, i need to delete all other results and retain one.
I have the query to find duplicate records
POST /data_12/_search
{
"size": 0,
"aggs": {
"duplicateCount": {
"terms": {
"field": "col2.keyword",
"min_doc_count": 2,
"size": 10
},
"aggs": {
"duplicateDocuments": {
"top_hits": {
"size": 100
}
}
}
}
}
}
Elasticsearch IDs are the equivalent of a primary key - it ensures there’s only one doc with that ID. If you don’t supply an ID (and many don’t) elasticsearch invents a new unique one. If you do supply an ID it checks to see if this already exists
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.