At my company we use a terms aggregation with include: partition number to paginate the results.
I noticed that in newer versions you have a composite aggregation which also allows pagination.
Which do you think is a better solution?
Is there a performance difference between the methods?
Composite aggregation is reliant on ordering buckets by a key that is derived from values in a single document. Taking a simple example, ordering buckets by IPAddress or perhaps extracting the domain name from a referrerUrl field. Each of these buckets can have sub aggregations e.g. max of date field (to know the last time you saw them) but you can't order the top-level buckets on this child agg "max date" property which is derived from multiple docs.
The terms aggregation can order by the values of child-aggregations e.g. reverse-sort on the max date value for an IP address to find IP addresses that haven't been active in a while. The downside is if the number of unique top-level buckets (IP addresses in this case ) is large you may have to use partitioning to ensure results are accurate within each arbitrary subset of the data.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.