Composite aggregation is an exciting feature, but I still have some questions about it.
In https://github.com/elastic/elasticsearch/pull/26800, the aggregation was firstly added and optimized for indices that set an index sorting that match the composite source definition.
In https://github.com/elastic/elasticsearch/pull/28745, a change refactored the composite aggregation. It replaced the optimization of index sorting with an execution mode that visits documents in the order of the values present in the leading source, with three strict conditions.
- The leading source in the composite definition uses an indexed field of type
date
(works also withdate_histogram
source),integer
,long
orkeyword
. - The query is a match_all query or a range query over the field that is used as the leading source in the composite definition.
- The sort order of the leading source is the natural order (ascending since postings and numerics are sorted in ascending order only).
I wonder why the previous optimization of index sorting could not be retained. It seems that the condition of index sorting is easier to be met and can early terminate aggregation on each segment without the condition of leading source and query.
Grateful for any help!
boice