Max_bucket exception from elasticsearch

Hi,

I have made a control visualization (option list) with 4 options-fields (let's say A, B, C & D).
The "size" field of each option is 10000 and search.max_buckets setting is 10000 too.

In ElasticSearch, for field A there are 2 distinguished values , A1 & A2.
The documents with A1 value in A field are 14800000 and they have 1028 distinguished values in field B .
The documents with A2 value in A field are 3700000 and they have 3500 distinguished values in field B .

In control visualization, when I select for field A the A1 value I get the error "5 of 12 shards failed" and when I inspect the query and the response to elasticsearch I see that

"type": "too_many_buckets_exception",
"reason": "Trying to create too many buckets. Must be less than or equal to: [10000] but was [10001]. This limit can be set by changing the [search.max_buckets] cluster level setting.",

But when I select A2 everything is ok.

Why the error is not shown with value A2?
Is this because, with a simple calculation, the unique buckets for A1 value are14800000 : 1028 = 14396,.... and greater than max_bucket or "size"
and for A2 are 3700000 : 3500 = 1057,.... ??

Also, I saw that * max_buckets = size 1.5 + 10 in https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_shard_size_3
That means that "size" should be set max to 6660 if max_bucket = 10000 ?
If I set size to 6000 there is no problem but if I set to 7000 I see againe the too_many_buckets_exception for A1.

Which value , max_bucket or "size" should I change in order not to get exceptions? Which calculations should I do?

Thank you in advance!

BR
Paraskevi

How long the time range that you want to display?
If the time range too wide, you need to increase the minimum interval of the bucket.
The higher minimum interval, the smallest bucket that you need.
If you need the minimum interval as you want and hit the max_bucket, you can increase the max_bucket using:

PUT _cluster/settings { "persistent": { "search.max_buckets": 20000 } } 

Hi Fadjar,

Thanks for the prompt answer. My concern basically is to understand how the number of buckets is calculated. I don't want to change the interval, I know the unique values of some fields so I would like to calculate the "possible" max_buckets number and not to increase to an "enough big" number because in that case I don't know if , for example, a change like this will increase cpu etc.
Could you please explain based on the example given above?

If you're using datetime histogram and don;t want to change the cluster setting, just reduce the time range of the dashboard.
The search.max_bucket id using the minimum interval in the time range of the query that you used.
Wider time range with smaller minimum interval, makes the bucket grow.
In my experienced, this max_bucket problem will occurred if I have long range of time query....
Another solution as I've done, using transforms.

Regards,
Fadjar Tandabawana

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.