What I takeaway from your explanation and a re-read of the relevant doc warnings is that a "top N" terms query is dangerous due to potentially missing/underrepresented term aggs sent back by the shards, leading to inaccurate final sums/counts/aggs by the coordinator, which in turn could lead to incorrect sorting to choose the top N.
I understand inaccuracies due to approximation to be the case for A) below, but NOT for B) or C). Can you confirm?
-
A)Atermsagg whereshard_size < term cardinality, and no routing key for the term (...shard_size defaluts to floor(1.5*size)+10) -
B)Acompositeagg that uses atermssource -
C)Atermsagg that usesinclude.partition,include.num_partitions
Some sample tests across a small 3-node, 3-shard cluster seemed to show B) and C) did not suffer from the same inaccuracy problem with sum that A) did.
Sorting to get "top N" with B) or C) is a different challenge. To get an accurate "top N" buckets, it seems I'd need to pull all terms' buckets to the client side and compare them to get a global sort. To do those:
-
B)sequentially paginate all buckets one page at a time using theafterkey, pulling every one back to the client-side, and doing a global sort of the buckets' sums to get "top N" -
C)requesting every partition from the cluster, each being a disjoint set of terms by way of a hashing the term against the partition numbers, accumulating those on the client-side, and doing a global sort of the buckets' sums to get "top N"
So to solve this use case with B) (composite) or C) (partitions), I'll need to siphon all buckets to the client first to do the final sorting pass, yes?
Now back to A). If documents with the same term are all sent to the same shard by way of a routing key, it does seem to solve the potentially inaccurate sums. Does a term-based routing key at index time also allow for "top N" queries to be answered without looking at all the buckets (server-side or client-side)?
Seems like it would because every shard could confidently send its top N candidates forward to the coordinator and let it sort out the final N. Am I missing some subtle corner case where a top N candidate wouldn't make it to the coordinator?
I'd just be left with a deep-pagination problem. e.g. if someone wanted page 100,000, to look at items 1,000,001-1,000,010 ... I'd need the server to process a terms agg with "size": 1000010 (ouch). Or fall back to B) or C) to make the client suffer instead of the server. Maybe a reasonable compromise here is to limit deep-pagination, and have that pagination stop at the 1000th page (10,000th item for pages of 10) to keep the 10k search.max_buckets limit.
Sorry to ramble through this. Any inaccuracies in these assessments?