Elasticsearch parent breaker tripping during high indexing usage

I am debugging an ES 7.17.3 installation that is persistently running out of memory and tripping the parent circuit breaker. This is an example error:

elasticsearch.exceptions.TransportError: TransportError(429, 'circuit_breaking_exception', '[parent] Data too large, data for [<http_request>] would be [8432214884/7.8gb], which is larger than the limit of [8160437862/7.5gb], real usage: [8432214672/7.8gb], new bytes reserved: [212/212b], usages [request=16440/16kb, fielddata=15261674/14.5mb, in_flight_requests=212/212b, model_inference=0/0b, eql_sequence=0/0b, accounting=61581360/58.7mb]')"

I've more than doubled the JVM heap size from 3 GB to 8 GB, but the memory issues are unchanged. The other surprising thing to me is that these error occur during periods of heavy indexing load, but they always seem to be triggered by a search call to ES, not an index call.

For debugging here is the output from the _cat/nodes endpoint:

name                   id   node.role   heap.current heap.percent heap.max
elasticsearch-master-0 uLpr cdfhilmrstw        5.6gb           70      8gb

And the node breaker stats:

{
  "uLprTtGlRWq-L2mi-mNeFg": {
    "timestamp": 1709934029351,
    "name": "elasticsearch-master-0",
    "transport_address": "10.1.1.148:9300",
    "host": "10.1.1.148",
    "ip": "10.1.1.148:9300",
    "roles": [
      "data",
      "data_cold",
      "data_content",
      "data_frozen",
      "data_hot",
      "data_warm",
      "ingest",
      "master",
      "ml",
      "remote_cluster_client",
      "transform"
    ],
    "attributes": {
      "ml.machine_memory": "17179869184",
      "xpack.installed": "true",
      "transform.node": "true",
      "ml.max_open_jobs": "512",
      "ml.max_jvm_size": "8589934592"
    },
    "breakers": {
      "request": {
        "limit_size_in_bytes": 5153960755,
        "limit_size": "4.7gb",
        "estimated_size_in_bytes": 0,
        "estimated_size": "0b",
        "overhead": 1,
        "tripped": 0
      },
      "fielddata": {
        "limit_size_in_bytes": 3435973836,
        "limit_size": "3.1gb",
        "estimated_size_in_bytes": 0,
        "estimated_size": "0b",
        "overhead": 1.03,
        "tripped": 0
      },
      "in_flight_requests": {
        "limit_size_in_bytes": 8589934592,
        "limit_size": "8gb",
        "estimated_size_in_bytes": 0,
        "estimated_size": "0b",
        "overhead": 2,
        "tripped": 0
      },
      "model_inference": {
        "limit_size_in_bytes": 4294967296,
        "limit_size": "4gb",
        "estimated_size_in_bytes": 0,
        "estimated_size": "0b",
        "overhead": 1,
        "tripped": 0
      },
      "eql_sequence": {
        "limit_size_in_bytes": 4294967296,
        "limit_size": "4gb",
        "estimated_size_in_bytes": 0,
        "estimated_size": "0b",
        "overhead": 1,
        "tripped": 0
      },
      "accounting": {
        "limit_size_in_bytes": 8589934592,
        "limit_size": "8gb",
        "estimated_size_in_bytes": 54473076,
        "estimated_size": "51.9mb",
        "overhead": 1,
        "tripped": 0
      },
      "parent": {
        "limit_size_in_bytes": 8160437862,
        "limit_size": "7.5gb",
        "estimated_size_in_bytes": 5529302008,
        "estimated_size": "5.1gb",
        "overhead": 1,
        "tripped": 3186
      }
    }
  }
}

I'm trying to piece together what is happening and how to resolve it. Is the garbage collector struggling to keep up? Is there a remediation besides still more memory?

What is the average size of your documents? What bulk size are you using? How many concurrent indexing threads are you using? What is the specification of your cluster in terms of nodes and hardware? What type of storage are you using? Local SSDs?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.