Uptime: `Error GraphQL error: [too_many_buckets_exception] Trying to create too many buckets. Must be less than or equal to: [10000] but was [10001]`

Hey Elastic friends! I recently set up my Elastic cluster with the new 7.0.0 for the purpose of using the Uptime visualizations. I got everything working well, but now when I look at the Uptime tab I receive this error underneath the uptime summary:

Error GraphQL error: [too_many_buckets_exception] Trying to create too many buckets. Must be less than or equal to: [10000] but was [10001]. This limit can be set by changing the [search.max_buckets] cluster level setting., with { max_buckets=10000 }

I came across this thread Requesting background info on `search.max_buckets` change that says this value is configurable, however, I am not sure where to configure this or what effect changing this value might have on the remainder of my cluster. Additionally, aside from having configured ILM, everything is a default setting - so perhaps the defaults should be changed?

I look forward to hearing suggestions for how I can remediate this issue; thanks!

Hi @phillhocking,

I think that from a scalability standpoint we can treat this as a bug; you shouldn't need to modify your cluster settings in order to view a relatively large number of monitors over an indeterminate series of time.

Can you let me know some things so I can try to make sure we've covered your case? It looks like you're running 7.0 with 26 monitors. What is the selected date range and how frequently are your monitors pinging?

Thank you for your prompt reply, @jkambic! The range is 'Last 1 hour' and my three http monitors are every 10 seconds, the icmp/tcp monitors comprising the remainder are every five seconds.

Did I mess up by putting my reply in the thread instead of replying directly to your comment?

Nope not at all, I'll get notified of any post to this thread.

I will try to reproduce your case, I am surprised you're seeing this. We've benchmarked the app at much higher volumes than you are reporting.

Thanks for reaching out!

@phillhocking to give you an update, I created this issue to track the bug. We'll try to address it as soon as we have time, and backport it to the version you're using.

1 Like

This is excellent @jkambic and I really appreciate all of your hard work on this. Is there something that I can do to remediate this on my cluster while I am waiting for this patch/release to solve the issue?

@phillhocking the error is the result of too much data being selected at a given time; the best way to resolve this temporarily is to select a smaller slice of data. Does a range like now-15m to now cause the error to occur as well? If it does I'd start there and work your way to a wider range until you see the error and treat that as the widest range. I know that's not ideal but it would be a temporary solution.

EDIT: I've opened a PR related to this. We'll try to get it reviewed and merged next week.

Thank you for your suggestion, @jkambic. I have done this and 15-minutes or a lesser amount seems to work:

Strangely enough, however, even a value of 18 minutes causes the error condition:

1 Like

This topic was automatically closed 24 days after the last reply. New replies are no longer allowed.