Set top_hits size dynamically for each bucket based on its doc_count with a script

DMinovski · July 18, 2023, 11:31pm

I use a query to find the duplicates in an index based on a field. Some documents have the same value in this field and they are duplicates.

{
    "size": 0,
    "aggs": {
        "duplicate_terms": {
            "terms": {
                "field": "id.keyword",
                "min_doc_count": 2,
                "size": 50
            },
            "aggs": {
                "duplicate_documents": {
                    "top_hits": {
                        "size": 100
                    }
                }
            }
        }
    }
}

The ids can have different numbers of duplicates, for example, one can have 48, and another 7. Now it returns all the documents for a specific id, up to 100. Is there a way to set the size dynamically based on doc_count for every bucket, so that it returns 47 documents for one id, and 6 for another?
If it could return the duplicates except one

doc_count - 1

system · August 15, 2023, 11:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Top N documents from top_hits, rather than top N per bucket Elasticsearch	1	885	July 5, 2017
Need help with Terms Aggregation : buckets count Elasticsearch	1	326	May 12, 2020
Min doc sub aggregation (find duplicates) Elasticsearch	1	482	October 7, 2017
Buckets.doc_count > hits.total Elasticsearch	2	947	April 17, 2018
Calculate size of top hit in elasticsearch based on number of buckets Elasticsearch	1	220	December 16, 2021

Set top_hits size dynamically for each bucket based on its doc_count with a script

Related topics