Max_doc_count in terms aggragation


(Gili Sade) #1

Hi
using Es 5.4
Is there a way to do a max doc count like min doc count in terms agg?

i want to get back only the first 200 (size : 200) buckets that contain max of 2000 doc count .

what i encountered with is this:
i have a terms aggregation
with size 200
ordered by "desc"

then I have a bucket_selector agg (as the pipeline agg) with the script:
"params._count <= 2000"

what happens is that none of the 200 buckets returned from the terms aggregation has less then 2000 results

when sorting by asc we do get buckets with less then 2000
since ordering by the pipeline agg is impossible i need a way to do the count filtering inside the terms aggregation

what we need can be achieved using composite agg but this is only available in es 6.1 and up
we cannot upgrade our env in the near future

any suggestions ?


(Zachary Tong) #2

The only safe, correct way to do it without the composite aggregation is to sort by desc and specify a large enough size that you start getting buckets with < 2000 docs.

Ordering by ascending _count is unreliable and entirely likely to give you incorrect doc counts. The error is unbounded. See the callout in: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-order

It's a feature we'd dearly like to remove, since sorting by _count ascending is no better than just randomly generating doc counts :frowning:

Note: there is also a bucket_sort pipeline agg which can be used to order the buckets, but you'll still need a large enough size to collect all the correct buckets first.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.