Why ES doesn't stop my aggregation but just crashes?

(Andrea Rota) #1

Hello, I am learning Elasticsearch basics and I am dealing with an Out of Memory error when performing aggregations with a large number of buckets.

I already know that for aggregations with 10000+ bucket I should use composite aggregation, but sometimes this cannot be done (e.g. queries auto-generated by Grafana). I don't understand why ES allows me to do a query that crashes it, and do not stop me beforehand.

I crafted a simple example.

I create a foo-index with a single document:

POST foo_index/foo_type/1
  "ts": "2018-10-20T10:00:00Z",
  "value": 10

Then I perform a very heavy aggregation query on it:

 "query": {
   "bool": {
     "filter": {
       "range": {
         "ts": {
           "gte": "1980-01-01T00:00:00Z",
           "lte": "2019-01-01T00:00:00Z"
 "aggs": {
   "by_ts": {
     "date_histogram": {
       "field": "ts",
       "interval": "10s",
       "extended_bounds": {
         "min": "1980-01-01T00:00:00Z",
         "max": "2020-01-01T00:00:00Z"
     "aggs": {
       "avg_value": {
         "avg": {
           "field": "value"

After few seconds, the JVM starts heavy garbage collection:

[2018-09-11T17:44:59,949][WARN ][o.e.m.j.JvmGcMonitorService] [AmQ_BYj] [gc][70]
 overhead, spent [2.4s] collecting in the last [2.6s]
[2018-09-11T17:45:15,822][WARN ][o.e.m.j.JvmGcMonitorService] [AmQ_BYj] [gc][71]
 overhead, spent [14.2s] collecting in the last [15.8s]

And after I while, it crashes with a Java Heap OOM.

Can anybody explain me why ES do not protect itself from this situation, for instance using a circuit breaker?

Edit: I tried ES 6.4.0 (Windows exe and Linux Docker), ES 6.3.1 (Linux Docker) with the same results.

(Jimferenczi) #2

We introduced a new cluster setting called search.max_buckets in 6x. It is disabled by default in this version and will default to 10,000 in the next major version (v7):
So in 6x you can set it manually in your cluster in order to protect against these killer queries. It is not set by default in 6x because we considered that it is a breaking change that requires a new version to be introduced. However we issue a deprecation warning in the logs if any aggregations reach the 10,000 limit in 6x. The message explicitly link to the new setting.

(Andrea Rota) #3

Hi Jimczi, thank you for the quick response.

I tried to set the limit and now everything works as intended, When dealing with killer queries, the server throws an exception like the following.

  "error": {
    "root_cause": [],
    "type": "search_phase_execution_exception",
    "reason": "",
    "phase": "fetch",
    "grouped": true,
    "failed_shards": [],
    "caused_by": {
      "type": "too_many_buckets_exception",
      "reason": "Trying to create too many buckets. Must be less than or equal to: [10000] but was [10001]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
      "max_buckets": 10000
  "status": 503

Does the search.max_buckets apply to composite aggregation too?

(Jimferenczi) #4

Does the search.max_buckets apply to composite aggregation too?

Yes but you can paginate the composite aggregation so the limit should not be a problem. You can retrieve 10,000 composite buckets and then use after option to retrieve the next page of buckets.

(Andrea Rota) #5

Great, that's exactly the way we do it. Thank you very much.

