ES aggregation query

I am running a huge aggregation query and getting the following error.

This aggregation creates too many buckets (10001) and will throw an error in future versions. You should update the [search.max_buckets] cluster setting or use the [composite] aggregation to paginate all buckets in multiple requests.

This is my query.

{ "aggs": { "projectname": { "terms": { "field": "project.keyword",  "order": { "_count": "desc" } }, 
"aggs": { "username": { "terms": { "field": "user.keyword",  "order": { "_count": "desc" } }, "aggs": { "currdir": { "terms": { "field": "CWD.keyword",  "order": { "_count": "desc" } }, 
"aggs": { "reqmem": { "terms": { "field": "reqmem",  "order": { "_count": "desc" } },
 "aggs": { "reqres": { "terms": { "field": "reqres.keyword",  "order": { "_count": "desc" } }, "aggs": { "noproc": { "max": { "field": "no_proc" } }, "mm": { "max": { "field": "max_mem" } }, "avgmem": { "avg": { "field": "max_mem" } }, "rt": { "max": { "field": "run_time" } }, "avgrt": { "avg": { "field": "run_time" } }, "pcm": { "max": { "field": "per_core_memory" } }, 
"avgpcm": { "avg": { "field": "per_core_memory" } }, "ptime": { "max": { "field": "pend_time" } }, "avgptime": { "avg": { "field": "pend_time" } },
 "cputime": { "max": { "field": "ru_utime" } }, "avgcputime": { "avg": { "field": "ru_utime" } } } } } } } } } } } } }, "query": { "bool": { "must": [ { "match_all": {} }, { "match_phrase": { "cluster": { "query": "abc01" } } }, { "match_phrase": { "queue": { "query": "cxx64" } } }, { "range": { "@timestamp": { "gte": "2020-09-01T00:00:00", "lte": "2020-09-30T23:59:59" } } } ] } }

Our ELK admin is not allowing to update the "search.max_buckets" value
Any idea how to fix this ?

Can you provide some more details?

  • elasticsearch version?
  • according to your query you need project * user * cwd * reqmem * reqres buckets, I guess that's way more than 10k, do you have an idea how many buckets this requires? Afaik aggs stop as soon as they overflow, therefore its more than 10001
  • how often do you intend to run this query?
  • what do you intend to do with the result?

As the error message says, use a composite aggregation. If you want to do further analysis based on the output of the query, you should consider transform, which is basically a composite aggregation that stores the result as documents. Your query lets me think, you want to have monthly buckets in addition to the groupings.

Thanks Hendrik.
Here are the details:

1- Elastic version : 6.2.4
2- No of buckets: ~40k
3- Frequency to run the query: 1-2 times in a week
4- Collect the data and analyze the workliad

In this case composite aggregation is your best option.

With new versions this might get easier:

  • transform >= 7.5
  • search.max_buckets default to 65k >= 7.9

Can you guide me how to implement composite aggregation.

Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.