Elasticsearch after upgrade on 7.7.0 starts CircuitBreakingExceptions and nodes leaves/rejoin cluster

Hi, we have an issue when performing an upgrade on one of our clusters. We are upgrading elasticsearch from version 6.8.1 to 7.7.0 (we have other clusters on this version).

The cluster has 3 master nodes, 5 data nodes and 2 ingest(client) nodes. After we upgraded (rolling upgrade) 3 of the data nodes cluster is unstable and upgraded nodes started to leave and reconnect which results in flapping YELLOW to GREEN state. From logs on the elected master node, we can see that the reason is CircuitBreakingException.

[2021-08-18T14:55:03,155][INFO ][o.e.c.r.a.AllocationService] [master-es11-master2] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[fb_profile_v1_2020-11][0]] ...]).
[2021-08-18T14:56:18,004][WARN ][o.e.c.r.a.AllocationService] [master-es11-master2] failing shard [failed shard, shard [fb_profile_v1_2020-11][0], node[15PSvp8ZTeieJrlSMz45zQ], [R], s[STARTED], a[id=HeYOA60wRkyfPc1YQH_qKw], message [failed to perform indices:data/write/bulk[s] on replica [fb_profile_v1_2020-11][0], node[15PSvp8ZTeieJrlSMz45zQ], [R], s[STARTED], a[id=HeYOA60wRkyfPc1YQH_qKw]], failure [RemoteTransportException[[es11-live6][10.64.10.249:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [15699093646/14.6gb], which is larger than the limit of [15627557273/14.5gb], real usage: [15699091456/14.6gb], new bytes reserved: [2190/2.1kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=2190/2.1kb, accounting=26768996/25.5mb]]; ], markAsStale [true]]
org.elasticsearch.transport.RemoteTransportException: [es11-live6][10.64.10.249:9300][indices:data/write/bulk[s][r]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [15699093646/14.6gb], which is larger than the limit of [15627557273/14.5gb], real usage: [15699091456/14.6gb], new bytes reserved: [2190/2.1kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=2190/2.1kb, accounting=26768996/25.5mb]

and

[2021-08-18T14:56:18,059][WARN ][o.e.g.G.InternalReplicaShardAllocator] [master-es11-master2] [fb_profile_v1_2020-11][0]: failed to list shard for shard_store on node [15PSvp8ZTeieJrlSMz45zQ]
org.elasticsearch.action.FailedNodeException: Failed node [15PSvp8ZTeieJrlSMz45zQ]

I think the second one is responsible for nodes leaving the cluster because the master node cannot contact them due to a circuit breaker triggered on the data node.

In /nodes API we see:

"type": "failed_node_exception",
"reason": "Failed node [3g9aBFGkTQWpXduA02aOEQ]",
"node_id": "3g9aBFGkTQWpXduA02aOEQ",
"caused_by": {
"type": "circuit_breaking_exception",
"reason": "[parent] Data too large, data for [<transport_request>] would be [16540220516/15.4gb], which is larger than the limit of [15794910003/14.7gb], real usage: [16540218400/15.4gb], new bytes reserved: [2116/2kb], usages [request=0/0b, fielddata=5446/5.3kb, in_flight_requests=2116/2kb, accounting=4157692/3.9mb]",
"bytes_wanted": 16540220516,
"bytes_limit": 15794910003

when I am trying to check heap size it looks stable most of the time below 70%. This behavior still occurs after the cluster is completely upgraded. Looks stable for a couple of hours, but after that, we see that shards are unallocated and one of node missing (elasticsearch process still runs on node, but not responding to master), in few minutes node gets back to cluster.

Threadpool after node rejoins cluster:

es11-live1   analyze             0 0 0
es11-live1   ccr                 0 0 0
es11-live1   fetch_shard_started 0 0 0
es11-live1   fetch_shard_store   0 0 0
es11-live1   flush               0 0 0
es11-live1   force_merge         0 0 0
es11-live1   generic             1 0 0
es11-live1   get                 0 0 0
es11-live1   listener            0 0 0
es11-live1   management          1 0 0
es11-live1   ml_datafeed         0 0 0
es11-live1   ml_job_comms        0 0 0
es11-live1   ml_utility          0 0 0
es11-live1   refresh             0 0 0
es11-live1   rollup_indexing     0 0 0
es11-live1   search              1 0 0
es11-live1   search_throttled    0 0 0
es11-live1   security-token-key  0 0 0
es11-live1   snapshot            0 0 0
es11-live1   transform_indexing  0 0 0
es11-live1   warmer              0 0 0
es11-live1   watcher             0 0 0
es11-live1   write               0 0 0

Have no idea what’s causing this, and will appreciate any help. (can provide more info if needed). I know that CircuitBreakerException can happen sometimes, but why reject the master's health check and disconnect node from the cluster?

Data nodes spec: 8CPU, 30G mem, JVM heap size set to 15687M

jvm.options:

8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly
14-:-XX:+UseG1GC
14-:-XX:G1ReservePercent=25
14-:-XX:InitiatingHeapOccupancyPercent=30
-Djava.io.tmpdir=${ES_TMPDIR}
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/lib/elasticsearch
-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log
8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:/var/log/elasticsearch/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m
9-:-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m

We use only minimal configuration changes:

cluster.name: es11
gateway.expected_nodes: 3
gateway.recover_after_time: 5m
gateway.recover_after_nodes: 2
gateway.recover_after_master_nodes: 3
discovery.seed_hosts: [ "es11-master1", "es11-master2", "es11-master3"  ]
cluster.initial_master_nodes: [ "es11-master1", "es11-master2", "es11-master3"  ]
node.name: es11-live1
node.master: false
node.data: true
node.attr.tag: live
path.data: /srv/elasticsearch
path.logs: /var/log/elasticsearch
action.destructive_requires_name: true
bootstrap.memory_lock: false
reindex.remote.whitelist: 0.0.0.0:9200
transport.compress: true
network.host: 10.64.10.135
http.host: 0.0.0.0
http.port: 9200
http.publish_host: 10.64.10.135
indices.recovery.max_bytes_per_sec: 200mb
indices.memory.index_buffer_size: 50%
cluster.routing.allocation.node_initial_primaries_recoveries: 20
cluster.routing.allocation.disk.threshold_enabled: false

We are also looking at "indices.memory.index_buffer_size: 50%" which is an option that we tuned in many versions before (like v2 maybe), can this be the reason for this, because we did not touch it in a new version and from docs we see that default is on 10%?

Thanks
Jiri

Just a quick comment to say that 7.7 is nearly EOL, so it's great to get to 7.X but please look at upgrading to 7.14 (latest) ASAP :slight_smile:

Can you share the output from the _cluster/stats?pretty&human API?

{
  "_nodes" : {
    "total" : 10,
    "successful" : 9,
    "failed" : 1,
    "failures" : [
      {
        "type" : "failed_node_exception",
        "reason" : "Failed node [uJ9AYYplT3abkK9ADOkF9Q]",
        "node_id" : "uJ9AYYplT3abkK9ADOkF9Q",
        "caused_by" : {
          "type" : "circuit_breaking_exception",
          "reason" : "[parent] Data too large, data for [<transport_request>] would be [16382516568/15.2gb], which is larger than the limit of [15794910003/14.7gb], real usage: [16382514688/15.2gb], new bytes reserved: [1880/1.8kb], usages [request=0/0b, fielddata=12425/12.1kb, in_flight_requests=1880/1.8kb, accounting=10597956/10.1mb]",
          "bytes_wanted" : 16382516568,
          "bytes_limit" : 15794910003,
          "durability" : "PERMANENT"
        }
      }
    ]
  },
  "cluster_name" : "es11",
  "cluster_uuid" : "GNLXSE8OTZqlm1k7si1Mxg",
  "timestamp" : 1629717646472,
  "status" : "yellow",
  "indices" : {
    "count" : 43,
    "shards" : {
      "total" : 89,
      "primaries" : 44,
      "replication" : 1.0227272727272727,
      "index" : {
        "shards" : {
          "min" : 1,
          "max" : 9,
          "avg" : 2.0697674418604652
        },
        "primaries" : {
          "min" : 0,
          "max" : 5,
          "avg" : 1.0232558139534884
        },
        "replication" : {
          "min" : 0.0,
          "max" : 1.0,
          "avg" : 0.7348837209302326
        }
      }
    },
    "docs" : {
      "count" : 2782156946,
      "deleted" : 670537228
    },
    "store" : {
      "size" : "346.7gb",
      "size_in_bytes" : 372361996657
    },
    "fielddata" : {
      "memory_size" : "81.5kb",
      "memory_size_in_bytes" : 83512,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size" : "551.8mb",
      "memory_size_in_bytes" : 578680176,
      "total_count" : 3496760,
      "hit_count" : 689629,
      "miss_count" : 2807131,
      "cache_size" : 3607,
      "cache_count" : 4467,
      "evictions" : 860
    },
    "completion" : {
      "size" : "0b",
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 1830,
      "memory" : "54.5mb",
      "memory_in_bytes" : 57172776,
      "terms_memory" : "2.1mb",
      "terms_memory_in_bytes" : 2285280,
      "stored_fields_memory" : "44.9mb",
      "stored_fields_memory_in_bytes" : 47082952,
      "term_vectors_memory" : "0b",
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory" : "24kb",
      "norms_memory_in_bytes" : 24576,
      "points_memory" : "0b",
      "points_memory_in_bytes" : 0,
      "doc_values_memory" : "7.4mb",
      "doc_values_memory_in_bytes" : 7779968,
      "index_writer_memory" : "13.7mb",
      "index_writer_memory_in_bytes" : 14369892,
      "version_map_memory" : "3mb",
      "version_map_memory_in_bytes" : 3187095,
      "fixed_bit_set" : "758.1mb",
      "fixed_bit_set_memory_in_bytes" : 794977944,
      "max_unsafe_auto_id_timestamp" : -1,
      "file_sizes" : { }
    },
    "mappings" : {
      "field_types" : [
        {
          "name" : "boolean",
          "count" : 11,
          "index_count" : 3
        },
        {
          "name" : "date",
          "count" : 78,
          "index_count" : 41
        },
        {
          "name" : "double",
          "count" : 52,
          "index_count" : 13
        },
        {
          "name" : "float",
          "count" : 39,
          "index_count" : 13
        },
        {
          "name" : "geo_shape",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "integer",
          "count" : 700,
          "index_count" : 28
        },
        {
          "name" : "keyword",
          "count" : 557,
          "index_count" : 43
        },
        {
          "name" : "long",
          "count" : 142,
          "index_count" : 29
        },
        {
          "name" : "nested",
          "count" : 355,
          "index_count" : 26
        },
        {
          "name" : "object",
          "count" : 47,
          "index_count" : 4
        },
        {
          "name" : "text",
          "count" : 54,
          "index_count" : 5
        }
      ]
    },
    "analysis" : {
      "char_filter_types" : [ ],
      "tokenizer_types" : [ ],
      "filter_types" : [ ],
      "analyzer_types" : [ ],
      "built_in_char_filters" : [ ],
      "built_in_tokenizers" : [ ],
      "built_in_filters" : [ ],
      "built_in_analyzers" : [ ]
    }
  },
  "nodes" : {
    "count" : {
      "total" : 9,
      "coordinating_only" : 0,
      "data" : 4,
      "ingest" : 9,
      "master" : 3,
      "ml" : 9,
      "remote_cluster_client" : 9,
      "transform" : 4,
      "voting_only" : 0
    },
    "versions" : [
      "7.7.0"
    ],
    "os" : {
      "available_processors" : 42,
      "allocated_processors" : 42,
      "names" : [
        {
          "name" : "Linux",
          "count" : 9
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "Ubuntu 16.04.4 LTS",
          "count" : 2
        },
        {
          "pretty_name" : "Ubuntu 16.04.6 LTS",
          "count" : 7
        }
      ],
      "mem" : {
        "total" : "176.5gb",
        "total_in_bytes" : 189606993920,
        "free" : "14.2gb",
        "free_in_bytes" : 15337373696,
        "used" : "162.3gb",
        "used_in_bytes" : 174269620224,
        "free_percent" : 8,
        "used_percent" : 92
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 113
      },
      "open_file_descriptors" : {
        "min" : 467,
        "max" : 816,
        "avg" : 644
      }
    },
    "jvm" : {
      "max_uptime" : "4d",
      "max_uptime_in_millis" : 347587230,
      "versions" : [
        {
          "version" : "14",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "14+36",
          "vm_vendor" : "AdoptOpenJDK",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 9
        }
      ],
      "mem" : {
        "heap_used" : "37.6gb",
        "heap_used_in_bytes" : 40378267848,
        "heap_max" : "88.2gb",
        "heap_max_in_bytes" : 94805950464
      },
      "threads" : 575
    },
    "fs" : {
      "total" : "1.1tb",
      "total_in_bytes" : 1282345390080,
      "free" : "807.8gb",
      "free_in_bytes" : 867466362880,
      "available" : "807.8gb",
      "available_in_bytes" : 867382476800
    },
    "plugins" : [
      {
        "name" : "analysis-icu",
        "version" : "7.7.0",
        "elasticsearch_version" : "7.7.0",
        "java_version" : "1.8",
        "description" : "The ICU Analysis plugin integrates the Lucene ICU module into Elasticsearch, adding ICU-related analysis components.",
        "classname" : "org.elasticsearch.plugin.analysis.icu.AnalysisICUPlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false
      },
      {
        "name" : "mapper-size",
        "version" : "7.7.0",
        "elasticsearch_version" : "7.7.0",
        "java_version" : "1.8",
        "description" : "The Mapper Size plugin allows document to record their uncompressed size at index time.",
        "classname" : "org.elasticsearch.plugin.mapper.MapperSizePlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false
      }
    ],
    "network_types" : {
      "transport_types" : {
        "security4" : 9
      },
      "http_types" : {
        "security4" : 9
      }
    },
    "discovery_types" : {
      "zen" : 9
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "deb",
        "count" : 9
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 3,
      "processor_stats" : {
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "rename" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "script" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "set" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        }
      }
    }
  }
}

We managed to change that: indices.memory.index_buffer_size: 50% on defaults 10%, but looks like that does not help. Don't understand why CircuitBreaker does not have exceptions for internal elasticsearch communication like "failed to list shard for shard_store on node" which triggers node leaving cluster.

We want to upgrade on latest version, but currently all our clusters running 7.7.0, so after we complete upgrading to this version, we will proceed to lastest (thanks for point on EOL)

failed to list shard for shard_store on node does not trigger a node leaving the cluster.

You're tripping the real memory circuit breaker which means the heap on this node is over 95% full. https://github.com/elastic/elasticsearch/pull/58674 helps a bit (7.9.0) by encouraging a GC before completely giving up, but really it seems that heap usage is out of control. Nodes won't do much useful work when they're so close to the limit.

It's possible that during the upgrade you've carried over some old/obsolete settings in your jvm.options file that misconfigure your GC, so check for that. Otherwise I'd suggest taking heap dumps to see what is overloading these nodes so much.

Thanks for reply, we use default jvm.options that goes with elasticsearch deb package (i posted jvm.options in first post - without empty lines and comments).

Yes aggree that we have an issue with full heap and we will try to investigate that heap dump, but still have no idea why nodes leaving cluster when circuit breaker is triggered. In logs i cannot see any messages about node leaves cluster, but in nodes API that node missing, and all his shards are unassigned, after some time he "rejoins" and allocates all shards again.

In logs we can see only:

"type": "server", "timestamp": "2021-08-23T12:36:53,080Z", "level": "WARN", "component": "o.e.c.r.a.AllocationService", "cluster.name": "es11", "node.name": "es11-master1", "message": "failing shard [failed shard, shard [fb_profile_v1_2021-08][0], node[zCmLHt8TSHejBiSRys39GA], [R], s[STARTED], a[id=aHJb6Xi2TnaYzbgMpRNIhA], message [failed to perform indices:data/write/bulk[s] on replica [fb_profile_v1_2021-08][0], node[zCmLHt8TSHejBiSRys39GA], [R], s[STARTED], a[id=aHJb6Xi2TnaYzbgMpRNIhA]], failure [RemoteTransportException[[es11-live5][10.64.11.254:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [15895302706/14.8gb], which is larger than the limit of [15794910003/14.7gb], real usage: [15895301056/14.8gb], new bytes reserved: [1650/1.6kb], usages [request=0/0b, fielddata=15326/14.9kb, in_flight_requests=6284/6.1kb, accounting=11803656/11.2mb]]; ], markAsStale [true]]", "cluster.uuid": "GNLXSE8OTZqlm1k7si1Mxg", "node.id": "n--psKv0Q-G6wv10dqRWBA" ,
{"type": "server", "timestamp": "2021-08-23T12:36:53,082Z", "level": "INFO", "component": "o.e.c.r.a.AllocationService", "cluster.name": "es11", "node.name": "es11-master1", "message": "Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[fb_profile_v1_2021-08][0]]]).", "cluster.uuid": "GNLXSE8OTZqlm1k7si1Mxg", "node.id": "n--psKv0Q-G6wv10dqRWBA"  }
{"type": "server", "timestamp": "2021-08-23T12:36:53,103Z", "level": "WARN", "component": "o.e.g.G.InternalReplicaShardAllocator", "cluster.name": "es11", "node.name": "es11-master1", "message": "[fb_profile_v1_2021-08][0]: failed to list shard for shard_store on node [zCmLHt8TSHejBiSRys39GA]", "cluster.uuid": "GNLXSE8OTZqlm1k7si1Mxg", "node.id": "n--psKv0Q-G6wv10dqRWBA" ,
{"type": "server", "timestamp": "2021-08-23T12:36:53,294Z", "level": "WARN", "component": "o.e.c.r.a.AllocationService", "cluster.name": "es11", "node.name": "es11-master1", "message": "failing shard [failed shard, shard [fb_profile_v1_2019-06][0], node[zCmLHt8TSHejBiSRys39GA], [R], s[STARTED], a[id=UrVi6zJ3SQW-RK7xDQ_njA], message [failed to perform indices:data/write/bulk[s] on replica [fb_profile_v1_2019-06][0], node[zCmLHt8TSHejBiSRys39GA], [R], s[STARTED], a[id=UrVi6zJ3SQW-RK7xDQ_njA]], failure [RemoteTransportException[[es11-live5][10.64.11.254:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [15895305690/14.8gb], which is larger than the limit of [15794910003/14.7gb], real usage: [15895301056/14.8gb], new bytes reserved: [4634/4.5kb], usages [request=0/0b, fielddata=15326/14.9kb, in_flight_requests=6284/6.1kb, accounting=11803656/11.2mb]]; ], markAsStale [true]]", "cluster.uuid": "GNLXSE8OTZqlm1k7si1Mxg", "node.id": "n--psKv0Q-G6wv10dqRWBA" ,
{"type": "server", "timestamp": "2021-08-23T12:36:53,296Z", "level": "WARN", "component": "o.e.c.r.a.AllocationService", "cluster.name": "es11", "node.name": "es11-master1", "message": "failing shard [failed shard, shard [fb_profile_v1_2017-02][0], node[zCmLHt8TSHejBiSRys39GA], [R], s[STARTED], a[id=Vj02ERIQSGGJy_IQqeyeLQ], message [failed to perform indices:data/write/bulk[s] on replica [fb_profile_v1_2017-02][0], node[zCmLHt8TSHejBiSRys39GA], [R], s[STARTED], a[id=Vj02ERIQSGGJy_IQqeyeLQ]], failure [RemoteTransportException[[es11-live5][10.64.11.254:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [15912089422/14.8gb], which is larger than the limit of [15794910003/14.7gb], real usage: [15912078272/14.8gb], new bytes reserved: [11150/10.8kb], usages [request=0/0b, fielddata=15326/14.9kb, in_flight_requests=11150/10.8kb, accounting=11803656/11.2mb]]; ], markAsStale [true]]", "cluster.uuid": "GNLXSE8OTZqlm1k7si1Mxg", "node.id": "n--psKv0Q-G6wv10dqRWBA" ,
{"type": "server", "timestamp": "2021-08-23T12:36:53,297Z", "level": "WARN", "component": "o.e.c.r.a.AllocationService", "cluster.name": "es11", "node.name": "es11-master1", "message": "failing shard [failed shard, shard [fb_profile_v1_2020-11][0], node[zCmLHt8TSHejBiSRys39GA], [R], s[STARTED], a[id=gndScyk0QSuCYb-8EE5VNQ], message [failed to perform indices:data/write/bulk[s] on replica [fb_profile_v1_2020-11][0], node[zCmLHt8TSHejBiSRys39GA], [R], s[STARTED], a[id=gndScyk0QSuCYb-8EE5VNQ]], failure [RemoteTransportException[[es11-live5][10.64.11.254:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [15912095072/14.8gb], which is larger than the limit of [15794910003/14.7gb], real usage: [15912078272/14.8gb], new bytes reserved: [16800/16.4kb], usages [request=0/0b, fielddata=15326/14.9kb, in_flight_requests=16800/16.4kb, accounting=11803656/11.2mb]]; ], markAsStale [true]]", "cluster.uuid": "GNLXSE8OTZqlm1k7si1Mxg", "node.id": "n--psKv0Q-G6wv10dqRWBA" ,
{"type": "server", "timestamp": "2021-08-23T12:36:53,298Z", "level": "WARN", "component": "o.e.c.r.a.AllocationService", "cluster.name": "es11", "node.name": "es11-master1", "message": "failing shard [failed shard, shard [fb_profile_v1_2020-02][0], node[zCmLHt8TSHejBiSRys39GA], [R], s[STARTED], a[id=0rqEdJYtTfaMNwo0LmCkeQ], message [failed to perform indices:data/write/bulk[s] on replica [fb_profile_v1_2020-02][0], node[zCmLHt8TSHejBiSRys39GA], [R], s[STARTED], a[id=0rqEdJYtTfaMNwo0LmCkeQ]], failure [RemoteTransportException[[es11-live5][10.64.11.254:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [15903699364/14.8gb], which is larger than the limit of [15794910003/14.7gb], real usage: [15903689664/14.8gb], new bytes reserved: [9700/9.4kb], usages [request=0/0b, fielddata=15326/14.9kb, in_flight_requests=13390/13kb, accounting=11803656/11.2mb]]; ], markAsStale [true]]", "cluster.uuid": "GNLXSE8OTZqlm1k7si1Mxg", "node.id": "n--psKv0Q-G6wv10dqRWBA" ,
{"type": "server", "timestamp": "2021-08-23T12:36:53,299Z", "level": "WARN", "component": "o.e.c.r.a.AllocationService", "cluster.name": "es11", "node.name": "es11-master1", "message": "failing shard [failed shard, shard [fb_profile_v1_2016-05][0], node[zCmLHt8TSHejBiSRys39GA], [R], s[STARTED], a[id=JE_KH69TQHSkPSOgdBxqfg], message [failed to perform indices:data/write/bulk[s] on replica [fb_profile_v1_2016-05][0], node[zCmLHt8TSHejBiSRys39GA], [R], s[STARTED], a[id=JE_KH69TQHSkPSOgdBxqfg]], failure [RemoteTransportException[[es11-live5][10.64.11.254:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [15916290160/14.8gb], which is larger than the limit of [15794910003/14.7gb], real usage: [15916272576/14.8gb], new bytes reserved: [17584/17.1kb], usages [request=0/0b, fielddata=15326/14.9kb, in_flight_requests=17584/17.1kb, accounting=11803656/11.2mb]]; ], markAsStale [true]]", "cluster.uuid": "GNLXSE8OTZqlm1k7si1Mxg", "node.id": "n--psKv0Q-G6wv10dqRWBA" ,
{"type": "server", "timestamp": "2021-08-23T12:36:53,318Z", "level": "WARN", "component": "o.e.g.G.InternalReplicaShardAllocator", "cluster.name": "es11", "node.name": "es11-master1", "message": "[fb_profile_v1_2019-06][0]: failed to list shard for shard_store on node [zCmLHt8TSHejBiSRys39GA]", "cluster.uuid": "GNLXSE8OTZqlm1k7si1Mxg", "node.id": "n--psKv0Q-G6wv10dqRWBA" ,
{"type": "server", "timestamp": "2021-08-23T12:36:53,323Z", "level": "WARN", "component": "o.e.g.G.InternalReplicaShardAllocator", "cluster.name": "es11", "node.name": "es11-master1", "message": "[fb_profile_v1_2017-02][0]: failed to list shard for shard_store on node [zCmLHt8TSHejBiSRys39GA]", "cluster.uuid": "GNLXSE8OTZqlm1k7si1Mxg", "node.id": "n--psKv0Q-G6wv10dqRWBA" ,
{"type": "server", "timestamp": "2021-08-23T12:36:53,329Z", "level": "WARN", "component": "o.e.g.G.InternalReplicaShardAllocator", "cluster.name": "es11", "node.name": "es11-master1", "message": "[fb_profile_v1_2020-11][0]: failed to list shard for shard_store on node [zCmLHt8TSHejBiSRys39GA]", "cluster.uuid": "GNLXSE8OTZqlm1k7si1Mxg", "node.id": "n--psKv0Q-G6wv10dqRWBA" ,

i trim that logs from stacktrace (just first lines). That was from master node and on data node just this single log message:

{"type": "server", "timestamp": "2021-08-23T12:07:15,952Z", "level": "INFO", "component": "o.e.i.s.TransportNodesListShardStoreMetaData", "cluster.name": "es11", "node.name": "es11-live3", "message": "[fb_profile_v1_2020-11][0]: failed to obtain shard lock", "cluster.uuid": "GNLXSE8OTZqlm1k7si1Mxg", "node.id": "uJ9AYYplT3abkK9ADOkF9Q" ,
"stacktrace": ["org.elasticsearch.env.ShardLockObtainFailedException: [fb_profile_v1_2020-11][0]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [read metadata snapshot]",
"at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:771) ~[elasticsearch-7.7.0.jar:7.7.0]",
"at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:686) ~[elasticsearch-7.7.0.jar:7.7.0]",
"at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:434) [elasticsearch-7.7.0.jar:7.7.0]",
"at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:173) [elasticsearch-7.7.0.jar:7.7.0]",
"at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:117) [elasticsearch-7.7.0.jar:7.7.0]",
"at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:67) [elasticsearch-7.7.0.jar:7.7.0]",
"at org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:129) [elasticsearch-7.7.0.jar:7.7.0]",
"at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:244) [elasticsearch-7.7.0.jar:7.7.0]",
"at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:240) [elasticsearch-7.7.0.jar:7.7.0]",
"at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:257) [x-pack-security-7.7.0.jar:7.7.0]",
"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.7.0.jar:7.7.0]",
"at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:315) [x-pack-security-7.7.0.jar:7.7.0]",
"at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) [elasticsearch-7.7.0.jar:7.7.0]",
"at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:264) [elasticsearch-7.7.0.jar:7.7.0]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) [elasticsearch-7.7.0.jar:7.7.0]",
"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.7.0.jar:7.7.0]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]",
"at java.lang.Thread.run(Thread.java:832) [?:?]"] }

Are you sure the node is really leaving the cluster? The master would always log a message containing the string node-left if so. From your description it sounds like the node remains in the cluster but can't handle writes or respond to stats requests because its heap is too full. If it rejects a write then it unassigns the target shard (since it's now missing a write, i.e. it's not in sync any more), and if it can't respond to stats requests then you won't see it in the nodes API.

You are correct, it's not leaving in a traditional statement, but it's not responding to any stats request (so it looks like it leaves).

I'll try to turn on that heap dump, hope it will tell us more.

I just found out that heap dump will not work, or at least from

-XX:+HeapDumpOnOutOfMemoryError

because from what i understand that circuit breaker is triggered before heap goes out of mem, so no dumps cant be found in default -XX:HeapDumpPath=/var/lib/elasticsearch

Yes you'll need to collect a heap dump yourself using jmap.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.