CircuitBreakingException [parent] Data too large in 7.4.2

I'm hoping someone can help me understand what is causing this exception. This is being thrown frequently, both while writing to and reading from the cluster.

Here's an example error I received while running GET /_cat/indices?v in Kibana:

{
  "error": {
    "root_cause": [
      {
        "type": "circuit_breaking_exception",
        "reason": "[parent] Data too large, data for [<http_request>] would be [4075745992/3.7gb], which is larger than the limit of [4063657984/3.7gb], real usage: [4075745992/3.7gb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=8989/8.7kb, in_flight_requests=0/0b, accounting=1803266/1.7mb]",
        "bytes_wanted": 4075745992,
        "bytes_limit": 4063657984,
        "durability": "PERMANENT"
      }
    ],
    "type": "circuit_breaking_exception",
    "reason": "[parent] Data too large, data for [<http_request>] would be [4075745992/3.7gb], which is larger than the limit of [4063657984/3.7gb], real usage: [4075745992/3.7gb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=8989/8.7kb, in_flight_requests=0/0b, accounting=1803266/1.7mb]",
    "bytes_wanted": 4075745992,
    "bytes_limit": 4063657984,
    "durability": "PERMANENT"
  },
  "status": 429
}

My cluster has 3 master nodes and 3 data nodes. The cluster has 2 indexes and each index has 2 shards (with replica count set to 1).

From what I've read, this error indicates that I've reached 95% heap usage on at least one node. But when I add up the usage from the circuit breakers (request + fielddata + in_flight_requests + accounting), they never total more than ~20mb. So something else must be responsible for the memory usage.

I noticed that the cluster was in yellow status which I narrowed down to an allocation failure assigning replicas to the units2 index. I removed the replicas which caused the cluster status to return to green and I stopped seeing errors for a while. This made me think that replication was using too much memory and causing the issue. To test this theory, I let the cluster run overnight without the replicas. Unfortunately this morning I found thousands of new CircuitBreakingExceptions.

I'm not sure what to look at next and would appreciate any assistance you can provide. For some context, I've run several commands this morning to looks for clues. I've copied the output of those commands below.

Here's the output of the GET /_cat/indices?v command:

health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   units2    AHcurH6cTASFSj4AF1q7rQ   2   0   14147313      1107187      1.8gb          1.8gb
green  open   .kibana_2 _Uo50jEPQOK9iBGbh9zw4w   1   1                              3.7kb               
green  open   places2   L0T_uvxZR8maIVwO3d44hw   2   1       6356         2570     45.7mb         19.6mb
green  open   .kibana_1 OousjPfkSHeySiLefOdGOw   1   1                               283b               

The output of GET /_cat/shards?v:

index     shard prirep state       docs  store ip            node
.kibana_2 0     p      STARTED        1  3.7kb x.x.x.x 451a15942b572d7159f0736533a7533b
.kibana_2 0     r      STARTED        1  3.7kb x.x.x.x 7bcda2e106963bc7c4099a16d057b265
.kibana_1 0     r      STARTED        0   283b x.x.x.x 66c9c69b225cb26bb1988e6427d529a8
.kibana_1 0     p      STARTED        0   283b x.x.x.x 451a15942b572d7159f0736533a7533b
places2   1     r      STARTED     6405 12.8mb x.x.x.x 66c9c69b225cb26bb1988e6427d529a8
places2   1     p      STARTED     6405 13.3mb x.x.x.x 451a15942b572d7159f0736533a7533b
places2   0     p      STARTED     6356 19.6mb x.x.x.x 66c9c69b225cb26bb1988e6427d529a8
places2   0     r      STARTED     6356 13.3mb x.x.x.x 7bcda2e106963bc7c4099a16d057b265
units2    1     p      STARTED 14353629  2.1gb x.x.x.x 451a15942b572d7159f0736533a7533b
units2    0     p      STARTED 14147313  1.8gb x.x.x.x 7bcda2e106963bc7c4099a16d057b265

The output of GET /_cluster/stats:

{
  "_nodes" : {
    "total" : 6,
    "successful" : 6,
    "failed" : 0
  },
  "cluster_name" : "843863714247:search-00",
  "cluster_uuid" : "z-d6Y0FwRXikLrOSwBlPxg",
  "timestamp" : 1587481456218,
  "status" : "green",
  "indices" : {
    "count" : 4,
    "shards" : {
      "total" : 10,
      "primaries" : 6,
      "replication" : 0.6666666666666666,
      "index" : {
        "shards" : {
          "min" : 2,
          "max" : 4,
          "avg" : 2.5
        },
        "primaries" : {
          "min" : 1,
          "max" : 2,
          "avg" : 1.5
        },
        "replication" : {
          "min" : 0.0,
          "max" : 1.0,
          "avg" : 0.75
        }
      }
    },
    "docs" : {
      "count" : 28513707,
      "deleted" : 3754420
    },
    "store" : {
      "size_in_bytes" : 4137770567
    },
    "fielddata" : {
      "memory_size_in_bytes" : 18752,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size_in_bytes" : 1109584,
      "total_count" : 3410091,
      "hit_count" : 613985,
      "miss_count" : 2796106,
      "cache_size" : 65,
      "cache_count" : 36427,
      "evictions" : 36362
    },
    "completion" : {
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 63,
      "memory_in_bytes" : 4344431,
      "terms_memory_in_bytes" : 1655163,
      "stored_fields_memory_in_bytes" : 1085760,
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory_in_bytes" : 128576,
      "points_memory_in_bytes" : 867196,
      "doc_values_memory_in_bytes" : 607736,
      "index_writer_memory_in_bytes" : 0,
      "version_map_memory_in_bytes" : 0,
      "fixed_bit_set_memory_in_bytes" : 102640,
      "max_unsafe_auto_id_timestamp" : -1,
      "file_sizes" : { }
    }
  },
  "nodes" : {
    "count" : {
      "total" : 6,
      "coordinating_only" : 0,
      "data" : 3,
      "ingest" : 3,
      "master" : 3
    },
    "versions" : [ "7.4.2" ],
    "os" : {
      "available_processors" : 12,
      "allocated_processors" : 12,
      "names" : [ {
        "count" : 6
      } ],
      "pretty_names" : [ {
        "count" : 6
      } ],
      "mem" : {
        "total_in_bytes" : 35828772864,
        "free_in_bytes" : 4621955072,
        "used_in_bytes" : 31206817792,
        "free_percent" : 13,
        "used_percent" : 87
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 106
      },
      "open_file_descriptors" : {
        "min" : 1403,
        "max" : 1506,
        "avg" : 1445
      }
    },
    "jvm" : {
      "max_uptime_in_millis" : 2994403385,
      "mem" : {
        "heap_used_in_bytes" : 13192197672,
        "heap_max_in_bytes" : 19222757376
      },
      "threads" : 759
    },
    "fs" : {
      "total_in_bytes" : 656313581568,
      "free_in_bytes" : 644325363712,
      "available_in_bytes" : 644224700416
    },
    "network_types" : {
      "transport_types" : {
        "com.amazon.opendistroforelasticsearch.security.ssl.http.netty.OpenDistroSecuritySSLNettyTransport" : 6
      },
      "http_types" : {
        "filter-jetty" : 6
      }
    },
    "discovery_types" : {
      "zen" : 6
    },
    "packaging_types" : [ {
      "flavor" : "oss",
      "type" : "tar",
      "count" : 6
    } ]
  }
}

And the output of GET /_nodes/stats/breaker:

{
  "_nodes" : {
    "total" : 6,
    "successful" : 6,
    "failed" : 0
  },
  "cluster_name" : "843863714247:search-00",
  "nodes" : {
    "y4FWob1iTHmPJ3bxxmpCIA" : {
      "timestamp" : 1587482913298,
      "name" : "7bcda2e106963bc7c4099a16d057b265",
      "roles" : [ "ingest", "data" ],
      "breakers" : {
        "request" : {
          "limit_size_in_bytes" : 2566520832,
          "limit_size" : "2.3gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "fielddata" : {
          "limit_size_in_bytes" : 1711013888,
          "limit_size" : "1.5gb",
          "estimated_size_in_bytes" : 9952,
          "estimated_size" : "9.7kb",
          "overhead" : 1.03,
          "tripped" : 0
        },
        "in_flight_requests" : {
          "limit_size_in_bytes" : 4277534720,
          "limit_size" : "3.9gb",
          "estimated_size_in_bytes" : 1420,
          "estimated_size" : "1.3kb",
          "overhead" : 2.0,
          "tripped" : 0
        },
        "accounting" : {
          "limit_size_in_bytes" : 4277534720,
          "limit_size" : "3.9gb",
          "estimated_size_in_bytes" : 1990055,
          "estimated_size" : "1.8mb",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "parent" : {
          "limit_size_in_bytes" : 4063657984,
          "limit_size" : "3.7gb",
          "estimated_size_in_bytes" : 3982582080,
          "estimated_size" : "3.7gb",
          "overhead" : 1.0,
          "tripped" : 16374
        }
      }
    },
    "EVTZfay_Tpm8x1BAMs5Rww" : {
      "timestamp" : 1587482913295,
      "name" : "ad06ce6e0185ee6225f94c41b500cef7",
      "roles" : [ "master" ],
      "breakers" : {
        "request" : {
          "limit_size_in_bytes" : 1278030643,
          "limit_size" : "1.1gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "fielddata" : {
          "limit_size_in_bytes" : 852020428,
          "limit_size" : "812.5mb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.03,
          "tripped" : 0
        },
        "in_flight_requests" : {
          "limit_size_in_bytes" : 2130051072,
          "limit_size" : "1.9gb",
          "estimated_size_in_bytes" : 1420,
          "estimated_size" : "1.3kb",
          "overhead" : 2.0,
          "tripped" : 0
        },
        "accounting" : {
          "limit_size_in_bytes" : 2130051072,
          "limit_size" : "1.9gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "parent" : {
          "limit_size_in_bytes" : 2023548518,
          "limit_size" : "1.8gb",
          "estimated_size_in_bytes" : 315615624,
          "estimated_size" : "300.9mb",
          "overhead" : 1.0,
          "tripped" : 0
        }
      }
    },
    "p3F4hA65SA6zTFn3DX3C1Q" : {
      "timestamp" : 1587482913295,
      "name" : "61b609648852523008e91f0abb928e08",
      "roles" : [ "master" ],
      "breakers" : {
        "request" : {
          "limit_size_in_bytes" : 1278030643,
          "limit_size" : "1.1gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "fielddata" : {
          "limit_size_in_bytes" : 852020428,
          "limit_size" : "812.5mb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.03,
          "tripped" : 0
        },
        "in_flight_requests" : {
          "limit_size_in_bytes" : 2130051072,
          "limit_size" : "1.9gb",
          "estimated_size_in_bytes" : 1420,
          "estimated_size" : "1.3kb",
          "overhead" : 2.0,
          "tripped" : 0
        },
        "accounting" : {
          "limit_size_in_bytes" : 2130051072,
          "limit_size" : "1.9gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "parent" : {
          "limit_size_in_bytes" : 2023548518,
          "limit_size" : "1.8gb",
          "estimated_size_in_bytes" : 408435048,
          "estimated_size" : "389.5mb",
          "overhead" : 1.0,
          "tripped" : 0
        }
      }
    },
    "g40rmLcPROiMTaYyctxZUQ" : {
      "timestamp" : 1587482913295,
      "name" : "73041e4d0411a847d61efda09005db2c",
      "roles" : [ "master" ],
      "breakers" : {
        "request" : {
          "limit_size_in_bytes" : 1278030643,
          "limit_size" : "1.1gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "fielddata" : {
          "limit_size_in_bytes" : 852020428,
          "limit_size" : "812.5mb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.03,
          "tripped" : 0
        },
        "in_flight_requests" : {
          "limit_size_in_bytes" : 2130051072,
          "limit_size" : "1.9gb",
          "estimated_size_in_bytes" : 1420,
          "estimated_size" : "1.3kb",
          "overhead" : 2.0,
          "tripped" : 0
        },
        "accounting" : {
          "limit_size_in_bytes" : 2130051072,
          "limit_size" : "1.9gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "parent" : {
          "limit_size_in_bytes" : 2023548518,
          "limit_size" : "1.8gb",
          "estimated_size_in_bytes" : 378401112,
          "estimated_size" : "360.8mb",
          "overhead" : 1.0,
          "tripped" : 0
        }
      }
    },
    "mWawkbG0QoyC177ReT-o_w" : {
      "timestamp" : 1587482913296,
      "name" : "451a15942b572d7159f0736533a7533b",
      "roles" : [ "ingest", "data" ],
      "breakers" : {
        "request" : {
          "limit_size_in_bytes" : 2566520832,
          "limit_size" : "2.3gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "fielddata" : {
          "limit_size_in_bytes" : 1711013888,
          "limit_size" : "1.5gb",
          "estimated_size_in_bytes" : 10104,
          "estimated_size" : "9.8kb",
          "overhead" : 1.03,
          "tripped" : 0
        },
        "in_flight_requests" : {
          "limit_size_in_bytes" : 4277534720,
          "limit_size" : "3.9gb",
          "estimated_size_in_bytes" : 1420,
          "estimated_size" : "1.3kb",
          "overhead" : 2.0,
          "tripped" : 0
        },
        "accounting" : {
          "limit_size_in_bytes" : 4277534720,
          "limit_size" : "3.9gb",
          "estimated_size_in_bytes" : 2084787,
          "estimated_size" : "1.9mb",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "parent" : {
          "limit_size_in_bytes" : 4063657984,
          "limit_size" : "3.7gb",
          "estimated_size_in_bytes" : 3930358832,
          "estimated_size" : "3.6gb",
          "overhead" : 1.0,
          "tripped" : 15345
        }
      }
    },
    "wFKrV_BYREeJuNqPQzyIBQ" : {
      "timestamp" : 1587482913295,
      "name" : "66c9c69b225cb26bb1988e6427d529a8",
      "roles" : [ "ingest", "data" ],
      "breakers" : {
        "request" : {
          "limit_size_in_bytes" : 2566520832,
          "limit_size" : "2.3gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "fielddata" : {
          "limit_size_in_bytes" : 1711013888,
          "limit_size" : "1.5gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.03,
          "tripped" : 0
        },
        "in_flight_requests" : {
          "limit_size_in_bytes" : 4277534720,
          "limit_size" : "3.9gb",
          "estimated_size_in_bytes" : 82200,
          "estimated_size" : "80.2kb",
          "overhead" : 2.0,
          "tripped" : 0
        },
        "accounting" : {
          "limit_size_in_bytes" : 4277534720,
          "limit_size" : "3.9gb",
          "estimated_size_in_bytes" : 444065,
          "estimated_size" : "433.6kb",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "parent" : {
          "limit_size_in_bytes" : 4063657984,
          "limit_size" : "3.7gb",
          "estimated_size_in_bytes" : 3975447168,
          "estimated_size" : "3.7gb",
          "overhead" : 1.0,
          "tripped" : 599
        }
      }
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.