Composite aggregation ORDER BY

Hello, can composite aggregation terms do ORDER BY doc_count?

i now use "terms": {"field": "city_name.keyword", "order": "desc"} but i need this "terms": {"field": "city_name.keyword", "order": {"_count": "desc"}}

is it possible to do this in the composite aggregation?

{
    "size": 0,
    "aggs": {
        "buckets": {
            "composite": {
                "size": 20,
                "sources": [
                    {
                        "city_name": {
                            "terms": {
                                "field": "country_name.keyword",
                                "order": "desc"
                            }
                        }
                    },
                ]
            },
            "aggs": {
                "counts": {
                    "value_count": {
                        "field": "city_name.keyword"
                    }
                }
            }
        },
    }
}
1 Like

@Christian_Dahlqvist @warkolm

What is the solution in this case? can u help me? thanks in advance

Please do not ping people not already involved in the thread. It is fine to bump an issue though if you have not received response in a few days.

okay, sorry :slightly_smiling_face:

Could this be a solution without composite aggregation?
how much optimal and right way it's for performance? is right "size": 1000000000?

{
    "size": 0,
    "aggs": {
        "country_name": {
            "terms": {
                "field": "geoip.country_name.keyword",
                "size": 1000000000
            },
            "aggs": {
                "city_name": {
                    "terms": {
                        "field": "geoip.city_name.keyword",
                        "size": 1000000000
                    },
                    "aggs": {
                        "count": {
                            "value_count": {
                                "field": "geoip.city_name.keyword"
                            }
                        },
                        "count_bucket_filter": {
                            "bucket_selector": {
                                "buckets_path": {
                                    "totalCount": "count"
                                },
                                "script": "params.totalCount > 2000"
                            }
                        },
                        "paging": {
                            "bucket_sort": {
                                "from": 0,
                                "size": 10
                            }
                        }
                    }
                },
                "paging": {
                    "bucket_sort": {
                        "from": 0,
                        "size": 10
                    }
                }
            }
        }
    }
}
1 Like

To answer your first question, no, there isn't a way to order by doc count with the composite aggregation. Ordering would require passing over the entire dataset first and keeping a record of how many docs each term has, which would require memory equivalent to the number of terms.

That's opposite of what the composite agg is made for: it's designed as a memory-friendly way to paginate over aggregations. Part of the tradeoff is that you lose things like ordering by doc count, since that isn't known until after all the docs have been collected.

Your second question is theoretically possible, but definitely a very bad idea. Requiring a huge size does what I described above: it keeps a giant list of terms in-memory so that they can be sorted. This will lead to memory and performance issues. Newer versions of Elasticsearch has a soft-limit on the number of buckets that can be created to help minimize this problem.

Ho many results do you need? If you only need a few (10, 100, etc) you can use the terms aggregation with sorting. If you need the entire dataset you'll have to page through it with composite agg and do sorting client-side, or "page" through it with multiple terms aggregation queries that each look at a small subset of the data.

Ordering an entire dataset is intrinsically expensive, there's not a good way to do it.

3 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.