Hello, can composite aggregation terms do ORDER BY doc_count?
i now use "terms": {"field": "city_name.keyword", "order": "desc"} but i need this "terms": {"field": "city_name.keyword", "order": {"_count": "desc"}}
is it possible to do this in the composite aggregation?
To answer your first question, no, there isn't a way to order by doc count with the composite aggregation. Ordering would require passing over the entire dataset first and keeping a record of how many docs each term has, which would require memory equivalent to the number of terms.
That's opposite of what the composite agg is made for: it's designed as a memory-friendly way to paginate over aggregations. Part of the tradeoff is that you lose things like ordering by doc count, since that isn't known until after all the docs have been collected.
Your second question is theoretically possible, but definitely a very bad idea. Requiring a huge size does what I described above: it keeps a giant list of terms in-memory so that they can be sorted. This will lead to memory and performance issues. Newer versions of Elasticsearch has a soft-limit on the number of buckets that can be created to help minimize this problem.
Ho many results do you need? If you only need a few (10, 100, etc) you can use the terms aggregation with sorting. If you need the entire dataset you'll have to page through it with composite agg and do sorting client-side, or "page" through it with multiple terms aggregation queries that each look at a small subset of the data.
Ordering an entire dataset is intrinsically expensive, there's not a good way to do it.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.