Hi,
At my company we use a terms aggregation with include: partition number to paginate the results.
I noticed that in newer versions you have a composite aggregation which also allows pagination.
Which do you think is a better solution?
Is there a performance difference between the methods?
Thanks,
Gary
Composite aggregation is reliant on ordering buckets by a key that is derived from values in a single document. Taking a simple example, ordering buckets by IPAddress or perhaps extracting the domain name from a referrerUrl field. Each of these buckets can have sub aggregations e.g. max of date field (to know the last time you saw them) but you can't order the top-level buckets on this child agg "max date" property which is derived from multiple docs.
The terms aggregation can order by the values of child-aggregations e.g. reverse-sort on the max date value for an IP address to find IP addresses that haven't been active in a while. The downside is if the number of unique top-level buckets (IP addresses in this case ) is large you may have to use partitioning to ensure results are accurate within each arbitrary subset of the data.
I am referring to the ability to paginate a composite aggregation with a single source terms aggregation using after:
If the number of composite buckets is too high (or unknown) to be returned in a single response it is possible to split the retrieval in multiple requests
See:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html#_after
Ah OK - so you've no need for ordering by child aggs etc. I'd expect Composite to be faster for this use case then.