I use a query to find the duplicates in an index based on a field. Some documents have the same value in this field and they are duplicates.
{
"size": 0,
"aggs": {
"duplicate_terms": {
"terms": {
"field": "id.keyword",
"min_doc_count": 2,
"size": 50
},
"aggs": {
"duplicate_documents": {
"top_hits": {
"size": 100
}
}
}
}
}
}
The ids can have different numbers of duplicates, for example, one can have 48, and another 7. Now it returns all the documents for a specific id, up to 100. Is there a way to set the size dynamically based on doc_count for every bucket, so that it returns 47 documents for one id, and 6 for another?
If it could return the duplicates except one
doc_count - 1