Composite aggregation ORDER BY

gogua · July 11, 2018, 12:23pm

Hello, can composite aggregation terms do ORDER BY doc_count?

i now use "terms": {"field": "city_name.keyword", "order": "desc"} but i need this "terms": {"field": "city_name.keyword", "order": {"_count": "desc"}}

is it possible to do this in the composite aggregation?

{
    "size": 0,
    "aggs": {
        "buckets": {
            "composite": {
                "size": 20,
                "sources": [
                    {
                        "city_name": {
                            "terms": {
                                "field": "country_name.keyword",
                                "order": "desc"
                            }
                        }
                    },
                ]
            },
            "aggs": {
                "counts": {
                    "value_count": {
                        "field": "city_name.keyword"
                    }
                }
            }
        },
    }
}

gogua · July 15, 2018, 7:04am

@Christian_Dahlqvist @warkolm

What is the solution in this case? can u help me? thanks in advance

Christian_Dahlqvist · July 15, 2018, 7:09am

Please do not ping people not already involved in the thread. It is fine to bump an issue though if you have not received response in a few days.

gogua · July 15, 2018, 7:24am

okay, sorry

gogua · July 15, 2018, 7:34am

Could this be a solution without composite aggregation?
how much optimal and right way it's for performance? is right "size": 1000000000?

{
    "size": 0,
    "aggs": {
        "country_name": {
            "terms": {
                "field": "geoip.country_name.keyword",
                "size": 1000000000
            },
            "aggs": {
                "city_name": {
                    "terms": {
                        "field": "geoip.city_name.keyword",
                        "size": 1000000000
                    },
                    "aggs": {
                        "count": {
                            "value_count": {
                                "field": "geoip.city_name.keyword"
                            }
                        },
                        "count_bucket_filter": {
                            "bucket_selector": {
                                "buckets_path": {
                                    "totalCount": "count"
                                },
                                "script": "params.totalCount > 2000"
                            }
                        },
                        "paging": {
                            "bucket_sort": {
                                "from": 0,
                                "size": 10
                            }
                        }
                    }
                },
                "paging": {
                    "bucket_sort": {
                        "from": 0,
                        "size": 10
                    }
                }
            }
        }
    }
}

polyfractal · July 18, 2018, 5:08pm

To answer your first question, no, there isn't a way to order by doc count with the composite aggregation. Ordering would require passing over the entire dataset first and keeping a record of how many docs each term has, which would require memory equivalent to the number of terms.

That's opposite of what the composite agg is made for: it's designed as a memory-friendly way to paginate over aggregations. Part of the tradeoff is that you lose things like ordering by doc count, since that isn't known until after all the docs have been collected.

Your second question is theoretically possible, but definitely a very bad idea. Requiring a huge size does what I described above: it keeps a giant list of terms in-memory so that they can be sorted. This will lead to memory and performance issues. Newer versions of Elasticsearch has a soft-limit on the number of buckets that can be created to help minimize this problem.

Ho many results do you need? If you only need a few (10, 100, etc) you can use the terms aggregation with sorting. If you need the entire dataset you'll have to page through it with composite agg and do sorting client-side, or "page" through it with multiple terms aggregation queries that each look at a small subset of the data.

Ordering an entire dataset is intrinsically expensive, there's not a good way to do it.

system · August 15, 2018, 5:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Composite vs term aggregation with size Elasticsearch	4	1057	February 19, 2019
Ordering aggregations by terms and metric Elasticsearch	1	407	July 5, 2017
Composite Aggregation - Document count Elasticsearch	2	895	November 19, 2018
Sorting results from composite aggregation Elasticsearch	14	3300	August 3, 2020
Why Composite aggregation shows Empty buckets first Elasticsearch aggregations	1	172	March 25, 2024

Composite aggregation ORDER BY

Related topics