We're a need to get some extended_stats on the bucket counts of a terms aggregation, i.e. we want to know the stats (avg, stddev etc) of the frequency of terms. So we do: terms {...} extended_stats { path:"terms._count" }, which works. The problem is that there a fairly high number of terms, and we also want them wrapped in a date_histogram, so we hit the max buckets limit.
Since I don't actually need the terms in 'buckets', is there a shortcut that avoids hitting the barrier?
Have you considered using the cardinality aggregation? This aggregation returns the number of unique values, which should be the same as the total number of buckets from a terms aggregation.
One thing to be aware of is that the cardinality aggregation returns an approximation of the unique count. Depending on your use case that may or not may be a problem.
I want stats on how many times each term is duplicated. E.g. if at one time bucket, the value 'A' occurs 1000 times, that's different to another time bucket where there are 500 terms, each occurring twice.
In that case you may want to look at the composite aggregation. This aggregation allows you to paginate through all the buckets, without hitting the limit.
The maximum number of buckets is a "soft limit" by the way. You could change it with the search.max_buckets cluster setting - but be aware that this may cause Elasticsearch to run out of memory.
Thanks, that leads to a new problem, does my app need to combine the time buckets? Or can that also be done with e.g. a pipeline aggregation in ES?
Also, I noticed that my auto_date_histogram targets 10 buckets, and my terms aggregation is limited to 100 terms (e.g. 10*100=1000 buckets), but I still get the max buckets problem. Am I missing something?
Unfortunately pipeline aggregations are not supported with composite aggregations. That may change in the future. You can follow the conversation on this topic here, if you're interested.
I'm not sure why you're hitting that max buckets limits with only 1000 buckets.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.