Is there a way to batch your query when facing with too_many_buckets_exception

I use the following query:

{
  "aggs": {
    "2": {
      "terms": {
        "field": "deviceID.keyword",
        "order": {
          "_key": "desc"
        },
        "size": 500000
      },
      "aggs": {
        "1": {
          "top_hits": {
            "_source": "deviceIP",
            "size": 1
          }
        }
      }
    }
  },
  "size": 0,
  "_source": {
    "excludes": []
  },
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [
    {
      "field": "@timestamp",
      "format": "date_time"
    }
  ],
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "@timestamp": {
              "gte": "now-30d"
            }
          }
        }
      ],
      "filter": [
        {
          "match_all": {}
        }
      ],
      "should": [],
      "must_not": []
          }
        }
      ]
    }
  }
}

I am getting the following exception, which is quite self-explanatory.

"reason" : {
      "type" : "too_many_buckets_exception",
      "reason" : "Trying to create too many buckets. Must be less than or equal to: [10000] but was [10001]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
      "max_buckets" : 10000
    }

Can I basically split the query into batches, so that I don't have to increase the max_buckets setting? Or is the only way to get the IP for like 50k devices to increase the max_buckets setting?

It's a bit of a different query, but this is supported by the Composite aggregation. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html

The composite aggregation is limited in functionality because it doesn't accept sorting by metrics, but you are already sorting by _key which is the default behavior of the composite aggregation.

Within each Term in your composite aggregation you can nest your top_hits aggregation.

Thanks for the input!

As I am a bit of a ES noob, this is all very advanced to me and I don't know how to begin. Could you give me a hint of what would my final query look like?

Hello,

I think you can page aggregations via

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html#_after

Per the documentation, this is an example:

GET /_search
{
    "aggs" : {
        "my_buckets": {
            "composite" : {
                 "sources" : [
                    { "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d", "order": "desc" } } },
                    { "product": { "terms": {"field": "product" } } }
                ]
            },
            "aggregations": {
                "the_avg": {
                    "avg": { "field": "price" }
                }
            }
        }
    }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.