Bucket size control for sub-aggregation

lzz118 · August 16, 2022, 2:04pm

Hi there,

I am using multi-level sub-aggregations in my application and I noticed that the size/shard size parameter does not seems to be used during the runtime to control the bucket size of the secondary or deeper level of aggregations. Could you please help me understand if this is expected behavior?

For example, in the following query, I am trying to run two-level aggregations and at each level specified size for 100 docs on a data set where there are say 200 unique parent field in each shard and for each of the unique parent field, there are 2000 unique child_fields.

What I observed is that, during aggregation, elasticsearch would pull 160 documents( that is 100*1.5+10) for the first level of the aggregation as expected, but for each of parent_field bucket, it would actually create 2000 buckets of the child_fields for the secondary aggregation during calculation and during the reduce phase, it then trims result back to 100 buckets as expected.

Given this behavior, if the cardinality of the child_field is very high for each parent, for example, instead of 2000:1, it is 200K:1, then the sub-aggregation could potentially create lots of buckets during calculation and as I understand, the search.max_bucket would not be enforced until the end.

Could you please let me know if my understanding is correct? If so, any recommendation on how to restrict the number of buckets created by the secondary aggregations without the risk of triggering the request/parent circuit breaker?

Thanks

{
  "query": {
    "bool": {
      "must": [
        {
        ...
        }
      ],
      "filter": [
        {
          "bool": {
            "must": [
             ...
            ]
          }
        }
      ]
    }
  },
  "aggregations": {
    "parent": {
      "terms": {
        "field": "parent_field",
        "size": 100,
        "min_doc_count": 1,
        "shard_min_doc_count": 0,
        "show_term_doc_count_error": false,
        "order": [
          {
            "_count": "desc"
          },
          {
            "_key": "asc"
          }
        ]
      },
      "aggregations": {
        "child": {
          "terms": {
            "field": "child_field",
            "size": 100,
            "min_doc_count": 1,
            "shard_min_doc_count": 0,
            "show_term_doc_count_error": false,
            "order": [
              {
                "_count": "desc"
              },
              {
                "_key": "asc"
              }
            ]
          }
        }
      }
    }
}

lzz118 · August 17, 2022, 10:20pm

any clue？

lzz118 · August 23, 2022, 3:21am

My guess this is either a limitation of elasticsearch or a use case that is too complicated that no one has ever run into before.

system · September 20, 2022, 3:22am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Control number of buckets created in an aggregation Elasticsearch	7	6595	September 5, 2019
Search phase execution exception with reason all shards failed Kibana	7	14692	February 26, 2020
How to confine geopoint aggregation bucket count Elasticsearch	1	262	December 29, 2021
Size of results on bucket for aggs Elasticsearch	2	348	August 8, 2018
Overcoming search.max_buckets Limitation in AWS Elasticsearch for Shard-Level Aggregations Elasticsearch	3	445	April 24, 2024

Bucket size control for sub-aggregation

Related topics