I'm performing a composite aggregation, and then performing a subaggregation (a sum) in each bucket.
I am curious if there is any way to sort the resulting buckets array based upon the this sum, rather than the natural ordering of the key being used to perform the bucketing.
No. The sorting and use of the after parameter relies on the fact that multiple data servers can independently agree on the order of results using local concrete attributes (the key of the bucket).
A sum is by definition a global attribute that is derived from many data servers and cannot therefore be reliably used to determine a local sort order when a data server is selecting which buckets to return.
To have reliable selection of keys sorted by a derived attribute (eg max, sum) etc you need to make sure the set of keys being considered in any one request is small enough to consider all the values exhaustively. This is what terms partitioning can be used for. You limit the analysis to an arbitrary subset of all terms to help ensure you get accurate numbers and sort orders within that set. Repeated calls for different partitions gives you multiple correctly sorted sets of keys but no overall single set.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.