Bucket size control for sub-aggregation

Hi there,

I am using multi-level sub-aggregations in my application and I noticed that the size/shard size parameter does not seems to be used during the runtime to control the bucket size of the secondary or deeper level of aggregations. Could you please help me understand if this is expected behavior?

For example, in the following query, I am trying to run two-level aggregations and at each level specified size for 100 docs on a data set where there are say 200 unique parent field in each shard and for each of the unique parent field, there are 2000 unique child_fields.

What I observed is that, during aggregation, elasticsearch would pull 160 documents( that is 100*1.5+10) for the first level of the aggregation as expected, but for each of parent_field bucket, it would actually create 2000 buckets of the child_fields for the secondary aggregation during calculation and during the reduce phase, it then trims result back to 100 buckets as expected.

Given this behavior, if the cardinality of the child_field is very high for each parent, for example, instead of 2000:1, it is 200K:1, then the sub-aggregation could potentially create lots of buckets during calculation and as I understand, the search.max_bucket would not be enforced until the end.

Could you please let me know if my understanding is correct? If so, any recommendation on how to restrict the number of buckets created by the secondary aggregations without the risk of triggering the request/parent circuit breaker?

Thanks

{
  "query": {
    "bool": {
      "must": [
        {
        ...
        }
      ],
      "filter": [
        {
          "bool": {
            "must": [
             ...
            ]
          }
        }
      ]
    }
  },
  "aggregations": {
    "parent": {
      "terms": {
        "field": "parent_field",
        "size": 100,
        "min_doc_count": 1,
        "shard_min_doc_count": 0,
        "show_term_doc_count_error": false,
        "order": [
          {
            "_count": "desc"
          },
          {
            "_key": "asc"
          }
        ]
      },
      "aggregations": {
        "child": {
          "terms": {
            "field": "child_field",
            "size": 100,
            "min_doc_count": 1,
            "shard_min_doc_count": 0,
            "show_term_doc_count_error": false,
            "order": [
              {
                "_count": "desc"
              },
              {
                "_key": "asc"
              }
            ]
          }
        }
      }
    }
}

any clue?

My guess this is either a limitation of elasticsearch or a use case that is too complicated that no one has ever run into before.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.