Questions about terms aggregations and too many buckets exception

Ivan · December 28, 2022, 12:35am

A question about internals, so hopefully I can get some response regarding some vague questions. Do not actually have access to the cluster to test out hypothesis.

First of all, according to the docs

"The search.max_buckets cluster setting limits the number of buckets allowed in a single response."

Is the number of buckets truly per response or per aggregation (and sub-aggregations)? If I have four terms aggregations, each set to a size equal to half of max_buckets (and the amount of data exists), would the exception occur?

What constistutes a bucket for a top hits aggregation? Each individual hit, just one bucket for all hits or none at all?

Here is a pseudo aggregation request:

{
  "query": {
    ...
  },
  "aggs": {
    "top_term_agg": {
      "terms": {
        "field": "somefield",
        "size": 5000,
        "min_doc_count": 2
      },
      "aggs": {
        "th_0": {
          "filter": {
            ...
          },
          "aggs": {
            "top_hits": {
              ...
            }
          }
        },
        "th_1": {
          ...
        },
        "min_bucket_selector": {
          "bucket_selector": {
            ...
          }
        }
      }
    }
  }
}

The top-level aggregation contains a variable (per request, each term would have the same amount of sub-aggregations) number of filtered sub-aggregations. The response is something like

"aggregations": {
  "top_term_agg": {
    "meta": {
      
    },
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
      {
        "key": "somekey",
        "doc_count": 60,
        "th_1": {
          "doc_count": 30,
          "top_hits": {
            "hits": {
              ..
            }
          }
        },
        "th_0": {
          "doc_count": 30,
          "top_hits": {
            "hits": {
              ...
            }
          }
        }
      }
    }
  }
}

Since each term in the top-level aggregation contains two filtered aggregations, would the total number of buckets be 5000*2? Does the top hits aggregation add to the total number of buckets? Does the max_buckets check occur before or after the bucket_selector?

I know that I have tools such as increasing max_buckets, lowering the shard_size or using terminate_after, but I am trying to determine what are the limits and perhaps advising user behavior first.

system · January 25, 2023, 12:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Terms aggregation: how many is too many? Elasticsearch	2	564	May 4, 2020
Term aggregation size well with in search.max_buckets, still getting too-many-bucket errors Elasticsearch	1	367	November 25, 2020
Requesting background info on `search.max_buckets` change Elasticsearch	5	9591	May 31, 2018
Search.max_buckets limit error on 7.6.0 even after setting to 10000 Elasticsearch	1	364	January 19, 2022
Need help with Terms Aggregation : buckets count Elasticsearch	1	325	May 12, 2020

Questions about terms aggregations and too many buckets exception

Related topics