Easy way to get a terms bucket count stats?

djmcgreal · December 4, 2019, 11:39am

Hi,

We're a need to get some extended_stats on the bucket counts of a terms aggregation, i.e. we want to know the stats (avg, stddev etc) of the frequency of terms. So we do: terms {...} extended_stats { path:"terms._count" }, which works. The problem is that there a fairly high number of terms, and we also want them wrapped in a date_histogram, so we hit the max buckets limit.

Since I don't actually need the terms in 'buckets', is there a shortcut that avoids hitting the barrier?

Thanks!
Dan.

abdon · December 4, 2019, 12:56pm

Have you considered using the cardinality aggregation? This aggregation returns the number of unique values, which should be the same as the total number of buckets from a terms aggregation.

One thing to be aware of is that the cardinality aggregation returns an approximation of the unique count. Depending on your use case that may or not may be a problem.

djmcgreal · December 4, 2019, 12:59pm

Hi, thanks Abdon,

I want stats on how many times each term is duplicated. E.g. if at one time bucket, the value 'A' occurs 1000 times, that's different to another time bucket where there are 500 terms, each occurring twice.

Thanks, Dan.

abdon · December 4, 2019, 1:20pm

In that case you may want to look at the composite aggregation. This aggregation allows you to paginate through all the buckets, without hitting the limit.

The maximum number of buckets is a "soft limit" by the way. You could change it with the search.max_buckets cluster setting - but be aware that this may cause Elasticsearch to run out of memory.

djmcgreal · December 4, 2019, 2:59pm

Thanks, that leads to a new problem, does my app need to combine the time buckets? Or can that also be done with e.g. a pipeline aggregation in ES?
Also, I noticed that my auto_date_histogram targets 10 buckets, and my terms aggregation is limited to 100 terms (e.g. 10*100=1000 buckets), but I still get the max buckets problem. Am I missing something?

abdon · December 6, 2019, 11:29am

Unfortunately pipeline aggregations are not supported with composite aggregations. That may change in the future. You can follow the conversation on this topic here, if you're interested.

I'm not sure why you're hitting that max buckets limits with only 1000 buckets.

system · January 3, 2020, 11:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Terms aggregation: how many is too many? Elasticsearch	2	584	May 4, 2020
Terms aggregation with a limit Elasticsearch	2	1910	July 6, 2017
Need help with Terms Aggregation : buckets count Elasticsearch	1	344	May 12, 2020
Limiting the number of documents for each bucket in term aggregation Elasticsearch	3	547	September 22, 2021
Returning count of buckets from aggregation terms search Elasticsearch	1	179	July 4, 2023

Easy way to get a terms bucket count stats?

Related topics