S3 snapshot causing nodes to drop from cluster

twilson · May 7, 2025, 5:24pm

Since upgrading to 8.17.2, and even now on 8.18.0, snapshots to S3 are causing nodes to drop out of the cluster due to the 'followers check retry count exceeded' error. This seems to be happening only on cold data nodes, though I haven't been able to get a completely successful snapshot since the issue started so maybe the hot/warm nodes are also affected. Also, when the node rejoins the cluster there is a lot of rebalancing that starts. This did not happen prior to 8.17.2.

The cold nodes are all physical servers with 8 CPU, minimum of 32GB memory, and SSD storage. The coordinator nodes are virtual machines with 8vCPU, 16GB memory. The cold nodes have between 800 and 850 indices, with 1 primary and 1 replica per index. I've ruled out hardware issues due to the fact that the node that drops from the cluster is not always the same.

Right about the time the nodes drop from the cluster there are garbage collection and thread timer warnings. From the data node:

[2025-05-07T10:30:24,327][INFO ][o.e.m.j.JvmGcMonitorService] [data1] [gc][80668] overhead, spent [324ms] collecting in the last [1s]
[2025-05-07T10:32:08,293][INFO ][o.e.m.j.JvmGcMonitorService] [data1] [gc][80771] overhead, spent [490ms] collecting in the last [1s]
[2025-05-07T10:33:01,460][WARN ][o.e.m.j.JvmGcMonitorService] [data1] [gc][G1 Concurrent GC][80789][167] duration [35.7s], collections [1]/[35.9s], total [35.7s]/[9.5m], memory [6.8gb]->[6.8gb]/[19.4gb], all_pools {[CodeHeap 'non-nmethods'] [2.6mb]->[2.6mb]/[5.5mb]}{[Metaspace] [197.9mb]->[197.9mb]/[0b]}{[CodeHeap 'profiled nmethods'] [18mb]->[17.9mb]/[117.2mb]}{[Compressed Class Space] [24mb]->[24mb]/[1gb]}{[young] [1.3gb]->[1.3gb]/[0b]}{[old] [5.4gb]->[5.4gb]/[19.4gb]}{[survivor] [43.6mb]->[43.6mb]/[0b]}{[CodeHeap 'non-profiled nmethods'] [33.3mb]->[33.3mb]/[117.2mb]}
[2025-05-07T10:33:01,461][WARN ][o.e.m.j.JvmGcMonitorService] [data1] [gc][80789] overhead, spent [35.7s] collecting in the last [35.9s]
[2025-05-07T10:33:01,644][WARN ][o.e.t.ThreadPool         ] [data1] timer thread slept for [35.9s/35911ms] on absolute clock which is above the warn threshold of [5000ms]
[2025-05-07T10:33:01,660][WARN ][o.e.t.ThreadPool         ] [data1] timer thread slept for [35.9s/35911037136ns] on relative clock which is above the warn threshold of [5000ms]
[2025-05-07T10:33:04,678][INFO ][o.e.c.c.Coordinator      ] [data1] [3] consecutive checks of the master node [{coord3}{j1EVgFV6Seieg20FZWiGDQ}{xTohzQf3R6O4yW5oGI9Pjw}{coord3}{192.168.1.26}{192.168.1.26:9300}{mt}{8.18.0}{7000099-8525000}] were unsuccessful ([3] rejected, [0] timed out), restarting discovery; more details may be available in the master node logs [last unsuccessful check: rejecting check since [{data1}{RgMem9IQSeGRgi-Xu40SwA}{ePJgynj6SZ6f3s8kiMcW-Q}{data1}{192.168.1.40}{192.168.1.40:9300}{cfs}{8.18.0}{7000099-8525000}] has been removed from the cluster]

From the master node at the same time:

[2025-05-07T10:32:57,845][INFO ][o.e.c.c.C.CoordinatorPublication] [coord3] after [10s] publication of cluster state version [5539940] is still waiting for {data1}{RgMem9IQSeGRgi-Xu40SwA}{ePJgynj6SZ6f3s8kiMcW-Q}{data1}{192.168.1.40}{192.168.1.40:9300}{cfs}{8.18.0}{7000099-8525000}{xpack.installed=true, ml.config_version=12.0.0, transform.config_version=10.0.0} [SENT_PUBLISH_REQUEST]
[2025-05-07T10:33:01,461][WARN ][o.e.t.TransportService   ] [coord3] Received response for a request that has timed out, sent [35s/35020ms] ago, timed out [25s/25015ms] ago, action [internal:coordination/fault_detection/follower_check], node [{data1}{RgMem9IQSeGRgi-Xu40SwA}{ePJgynj6SZ6f3s8kiMcW-Q}{data1}{192.168.1.40}{192.168.1.40:9300}{cfs}{8.18.0}{7000099-8525000}{xpack.installed=true, ml.config_version=12.0.0, transform.config_version=10.0.0}], id [7804453]
[2025-05-07T10:33:01,461][WARN ][o.e.t.TransportService   ] [coord3] Received response for a request that has timed out, sent [24s/24014ms] ago, timed out [14s/14009ms] ago, action [internal:coordination/fault_detection/follower_check], node [{data1}{RgMem9IQSeGRgi-Xu40SwA}{ePJgynj6SZ6f3s8kiMcW-Q}{data1}{192.168.1.40}{192.168.1.40:9300}{cfs}{8.18.0}{7000099-8525000}{xpack.installed=true, ml.config_version=12.0.0, transform.config_version=10.0.0}], id [7804708]
[2025-05-07T10:33:01,462][WARN ][o.e.t.TransportService   ] [coord3] Received response for a request that has timed out, sent [13s/13009ms] ago, timed out [3s/3002ms] ago, action [internal:coordination/fault_detection/follower_check], node [{data1}{RgMem9IQSeGRgi-Xu40SwA}{ePJgynj6SZ6f3s8kiMcW-Q}{data1}{192.168.1.40}{192.168.1.40:9300}{cfs}{8.18.0}{7000099-8525000}{xpack.installed=true, ml.config_version=12.0.0, transform.config_version=10.0.0}], id [7805045]
[2025-05-07T10:33:02,358][INFO ][o.e.c.r.a.AllocationService] [coord3] current.health="YELLOW" message="Cluster health status changed from [GREEN] to [YELLOW] (reason: [{data1}{RgMem9IQSeGRgi-Xu40SwA}{ePJgynj6SZ6f3s8kiMcW-Q}{data1}{192.168.1.40}{192.168.1.40:9300}{cfs}{8.18.0}{7000099-8525000} reason: followers check retry count exceeded [timeouts=3, failures=0]])." previous.health="GREEN" reason="{data1}{RgMem9IQSeGRgi-Xu40SwA}{ePJgynj6SZ6f3s8kiMcW-Q}{data1}{192.168.1.40}{192.168.1.40:9300}{cfs}{8.18.0}{7000099-8525000} reason: followers check retry count exceeded [timeouts=3, failures=0]"

The elasticsearch software is installed using rpm packages (RHEL 9.5), and the TCP retry count has been set to 5 per the guidance at https://www.elastic.co/docs/deploy-manage/deploy/self-managed/system-config-tcpretries. I've also tried reducing the number of connections to S3 using s3.client.default.max_connections: 40 in the elasticsearch.yml on each node.

I'm out of ideas on what to try next to remedy the situation.

DavidTurner · May 7, 2025, 5:46pm

twilson:

[2025-05-07T10:33:01,460][WARN ][o.e.m.j.JvmGcMonitorService] [data1] [gc][G1 Concurrent GC][80789][167] duration [35.7s], collections [1]/[35.9s], total [35.7s]/[9.5m], memory [6.8gb]->[6.8gb]/[19.4gb], all_pools {[CodeHeap 'non-nmethods'] [2.6mb]->[2.6mb]/[5.5mb]}{[Metaspace] [197.9mb]->[197.9mb]/[0b]}{[CodeHeap 'profiled nmethods'] [18mb]->[17.9mb]/[117.2mb]}{[Compressed Class Space] [24mb]->[24mb]/[1gb]}{[young] [1.3gb]->[1.3gb]/[0b]}{[old] [5.4gb]->[5.4gb]/[19.4gb]}{[survivor] [43.6mb]->[43.6mb]/[0b]}{[CodeHeap 'non-profiled nmethods'] [33.3mb]->[33.3mb]/[117.2mb]}

That'd explain it, a 35+ second GC pause will cause the node to leave the cluster. But it's weird that it's doing such a long stop-the-world GC when the heap usage is only 6.8GiB out of 19.4GiB. Can you share the GC logs too?

twilson · May 7, 2025, 6:05pm

Here's the gc logs for the data node around the time in question.

[2025-05-07T17:32:29.610+0000][3403][info ][gc,start    ] GC(1026) Pause Young (Mixed) (G1 Evacuation Pause)
[2025-05-07T17:32:29.610+0000][3403][info ][gc,task     ] GC(1026) Using 8 workers of 8 for evacuation
[2025-05-07T17:32:29.610+0000][3403][debug][gc,age      ] GC(1026) Desired survivor size 150994944 bytes, new threshold 15 (max threshold 15)
[2025-05-07T17:32:29.659+0000][3403][info ][gc,phases   ] GC(1026)   Pre Evacuate Collection Set: 0.63ms
[2025-05-07T17:32:29.660+0000][3403][info ][gc,phases   ] GC(1026)   Merge Heap Roots: 0.70ms
[2025-05-07T17:32:29.660+0000][3403][info ][gc,phases   ] GC(1026)   Evacuate Collection Set: 41.01ms
[2025-05-07T17:32:29.660+0000][3403][info ][gc,phases   ] GC(1026)   Post Evacuate Collection Set: 6.53ms
[2025-05-07T17:32:29.660+0000][3403][info ][gc,phases   ] GC(1026)   Other: 0.30ms
[2025-05-07T17:32:29.660+0000][3403][trace][gc,age      ] GC(1026) Age table:
[2025-05-07T17:32:29.660+0000][3403][trace][gc,age      ] GC(1026) - age   1:   28560544 bytes,   28560544 total
[2025-05-07T17:32:29.660+0000][3403][trace][gc,age      ] GC(1026) - age   2:     555344 bytes,   29115888 total
[2025-05-07T17:32:29.660+0000][3403][trace][gc,age      ] GC(1026) - age   3:     287896 bytes,   29403784 total
[2025-05-07T17:32:29.660+0000][3403][trace][gc,age      ] GC(1026) - age   4:     759112 bytes,   30162896 total
[2025-05-07T17:32:29.660+0000][3403][trace][gc,age      ] GC(1026) - age   5:     598968 bytes,   30761864 total
[2025-05-07T17:32:29.660+0000][3403][trace][gc,age      ] GC(1026) - age   6:     998576 bytes,   31760440 total
[2025-05-07T17:32:29.660+0000][3403][trace][gc,age      ] GC(1026) - age   7:     580256 bytes,   32340696 total
[2025-05-07T17:32:29.660+0000][3403][trace][gc,age      ] GC(1026) - age   9:     129520 bytes,   32470216 total
[2025-05-07T17:32:29.660+0000][3403][trace][gc,age      ] GC(1026) - age  10:     946024 bytes,   33416240 total
[2025-05-07T17:32:29.660+0000][3403][trace][gc,age      ] GC(1026) - age  11:     461128 bytes,   33877368 total
[2025-05-07T17:32:29.660+0000][3403][trace][gc,age      ] GC(1026) - age  12:     105432 bytes,   33982800 total
[2025-05-07T17:32:29.660+0000][3403][trace][gc,age      ] GC(1026) - age  13:      64048 bytes,   34046848 total
[2025-05-07T17:32:29.660+0000][3403][trace][gc,age      ] GC(1026) - age  14:     612104 bytes,   34658952 total
[2025-05-07T17:32:29.660+0000][3403][trace][gc,age      ] GC(1026) - age  15:      24016 bytes,   34682968 total
[2025-05-07T17:32:29.660+0000][3403][info ][gc,heap     ] GC(1026) Eden regions: 141->0(152)
[2025-05-07T17:32:29.660+0000][3403][info ][gc,heap     ] GC(1026) Survivor regions: 2->3(18)
[2025-05-07T17:32:29.660+0000][3403][info ][gc,heap     ] GC(1026) Old regions: 319->318
[2025-05-07T17:32:29.660+0000][3403][info ][gc,heap     ] GC(1026) Humongous regions: 36->32
[2025-05-07T17:32:29.660+0000][3403][info ][gc,metaspace] GC(1026) Metaspace: 203608K(214592K)->203608K(214592K) NonClass: 178888K(186624K)->178888K(186624K) Class: 24719K(27968K)->24719K(27968K)
[2025-05-07T17:32:29.660+0000][3403][info ][gc          ] GC(1026) Pause Young (Mixed) (G1 Evacuation Pause) 7948M->5629M(19952M) 50.236ms
[2025-05-07T17:32:29.660+0000][3403][info ][gc,cpu      ] GC(1026) User=0.35s Sys=0.00s Real=0.05s
[2025-05-07T17:32:29.661+0000][3403][info ][safepoint   ] Safepoint "G1CollectForAllocation", Time since last: 30799645447 ns, Reaching safepoint: 38267928 ns, At safepoint: 50547583 ns, Total: 88815511 ns
[2025-05-07T17:33:16.786+0000][3403][info ][gc,start    ] GC(1027) Pause Young (Mixed) (G1 Evacuation Pause)
[2025-05-07T17:33:16.808+0000][3403][info ][gc,task     ] GC(1027) Using 8 workers of 8 for evacuation
[2025-05-07T17:33:16.808+0000][3403][debug][gc,age      ] GC(1027) Desired survivor size 167772160 bytes, new threshold 15 (max threshold 15)
[2025-05-07T17:33:16.876+0000][3403][info ][gc,phases   ] GC(1027)   Pre Evacuate Collection Set: 0.72ms
[2025-05-07T17:33:16.876+0000][3403][info ][gc,phases   ] GC(1027)   Merge Heap Roots: 0.80ms
[2025-05-07T17:33:16.876+0000][3403][info ][gc,phases   ] GC(1027)   Evacuate Collection Set: 60.14ms
[2025-05-07T17:33:16.876+0000][3403][info ][gc,phases   ] GC(1027)   Post Evacuate Collection Set: 6.68ms
[2025-05-07T17:33:16.877+0000][3403][info ][gc,phases   ] GC(1027)   Other: 0.30ms
[2025-05-07T17:33:16.877+0000][3403][trace][gc,age      ] GC(1027) Age table:
[2025-05-07T17:33:16.877+0000][3403][trace][gc,age      ] GC(1027) - age   1:   14518528 bytes,   14518528 total
[2025-05-07T17:33:16.877+0000][3403][trace][gc,age      ] GC(1027) - age   2:    2102032 bytes,   16620560 total
[2025-05-07T17:33:16.877+0000][3403][trace][gc,age      ] GC(1027) - age   3:     536992 bytes,   17157552 total
[2025-05-07T17:33:16.877+0000][3403][trace][gc,age      ] GC(1027) - age   4:     280880 bytes,   17438432 total
[2025-05-07T17:33:16.877+0000][3403][trace][gc,age      ] GC(1027) - age   5:     758680 bytes,   18197112 total
[2025-05-07T17:33:16.877+0000][3403][trace][gc,age      ] GC(1027) - age   6:     598968 bytes,   18796080 total
[2025-05-07T17:33:16.877+0000][3403][trace][gc,age      ] GC(1027) - age   7:     725056 bytes,   19521136 total
[2025-05-07T17:33:16.877+0000][3403][trace][gc,age      ] GC(1027) - age   8:     578152 bytes,   20099288 total
[2025-05-07T17:33:16.877+0000][3403][trace][gc,age      ] GC(1027) - age  10:     126056 bytes,   20225344 total
[2025-05-07T17:33:16.877+0000][3403][trace][gc,age      ] GC(1027) - age  11:     862056 bytes,   21087400 total
[2025-05-07T17:33:16.877+0000][3403][trace][gc,age      ] GC(1027) - age  12:     461128 bytes,   21548528 total
[2025-05-07T17:33:16.877+0000][3403][trace][gc,age      ] GC(1027) - age  13:     105432 bytes,   21653960 total
[2025-05-07T17:33:16.877+0000][3403][trace][gc,age      ] GC(1027) - age  14:      64048 bytes,   21718008 total
[2025-05-07T17:33:16.877+0000][3403][trace][gc,age      ] GC(1027) - age  15:     612104 bytes,   22330112 total
[2025-05-07T17:33:16.877+0000][3403][info ][gc,heap     ] GC(1027) Eden regions: 154->0(225)
[2025-05-07T17:33:16.877+0000][3403][info ][gc,heap     ] GC(1027) Survivor regions: 3->2(20)
[2025-05-07T17:33:16.877+0000][3403][info ][gc,heap     ] GC(1027) Old regions: 318->318
[2025-05-07T17:33:16.877+0000][3403][info ][gc,heap     ] GC(1027) Humongous regions: 33->32
[2025-05-07T17:33:16.877+0000][3403][info ][gc,metaspace] GC(1027) Metaspace: 203608K(214592K)->203608K(214592K) NonClass: 178888K(186624K)->178888K(186624K) Class: 24719K(27968K)->24719K(27968K)
[2025-05-07T17:33:16.877+0000][3403][info ][gc          ] GC(1027) Pause Young (Mixed) (G1 Evacuation Pause) 8109M->5608M(19952M) 91.352ms
[2025-05-07T17:33:16.877+0000][3403][info ][gc,cpu      ] GC(1027) User=0.28s Sys=0.01s Real=0.09s
[2025-05-07T17:33:16.877+0000][3403][info ][safepoint   ] Safepoint "G1CollectForAllocation", Time since last: 47045896378 ns, Reaching safepoint: 79305922 ns, At safepoint: 91662323 ns, Total: 170968245 ns

DavidTurner · May 7, 2025, 6:29pm

Hm that doesn't really line up, there's no pause longer than a second here. Are you sure that's the same node at the same time?

twilson · May 7, 2025, 6:43pm

Sorry about that. My UTC->MDT time conversion was off by an hour. Stupid daylight savings time.

[2025-05-07T16:32:07.248+0000][3403][info ][gc,start    ] GC(930) Pause Young (Concurrent Start) (G1 Humongous Allocation)
[2025-05-07T16:32:07.248+0000][3403][info ][gc,task     ] GC(930) Using 8 workers of 8 for evacuation
[2025-05-07T16:32:07.248+0000][3403][debug][gc,age      ] GC(930) Desired survivor size 629145600 bytes, new threshold 15 (max threshold 15)
[2025-05-07T16:32:07.737+0000][3403][info ][gc,mmu      ] GC(930) MMU target violated: 201.0ms (200.0ms/201.0ms)
[2025-05-07T16:32:07.737+0000][3403][info ][gc,phases   ] GC(930)   Pre Evacuate Collection Set: 0.75ms
[2025-05-07T16:32:07.737+0000][3403][info ][gc,phases   ] GC(930)   Merge Heap Roots: 0.87ms
[2025-05-07T16:32:07.737+0000][3403][info ][gc,phases   ] GC(930)   Evacuate Collection Set: 473.07ms
[2025-05-07T16:32:07.738+0000][3403][info ][gc,phases   ] GC(930)   Post Evacuate Collection Set: 13.89ms
[2025-05-07T16:32:07.738+0000][3403][info ][gc,phases   ] GC(930)   Other: 0.64ms
[2025-05-07T16:32:07.738+0000][3403][trace][gc,age      ] GC(930) Age table:
[2025-05-07T16:32:07.738+0000][3403][trace][gc,age      ] GC(930) - age   1:   37355328 bytes,   37355328 total
[2025-05-07T16:32:07.738+0000][3403][trace][gc,age      ] GC(930) - age   2:    1102288 bytes,   38457616 total
[2025-05-07T16:32:07.738+0000][3403][trace][gc,age      ] GC(930) - age   3:     592432 bytes,   39050048 total
[2025-05-07T16:32:07.738+0000][3403][trace][gc,age      ] GC(930) - age   4:      76096 bytes,   39126144 total
[2025-05-07T16:32:07.738+0000][3403][trace][gc,age      ] GC(930) - age   5:      81912 bytes,   39208056 total
[2025-05-07T16:32:07.738+0000][3403][trace][gc,age      ] GC(930) - age   6:     326464 bytes,   39534520 total
[2025-05-07T16:32:07.738+0000][3403][trace][gc,age      ] GC(930) - age   7:      86376 bytes,   39620896 total
[2025-05-07T16:32:07.738+0000][3403][trace][gc,age      ] GC(930) - age   8:      37408 bytes,   39658304 total
[2025-05-07T16:32:07.738+0000][3403][trace][gc,age      ] GC(930) - age   9:      48336 bytes,   39706640 total
[2025-05-07T16:32:07.738+0000][3403][trace][gc,age      ] GC(930) - age  10:      31128 bytes,   39737768 total
[2025-05-07T16:32:07.738+0000][3403][trace][gc,age      ] GC(930) - age  11:      27528 bytes,   39765296 total
[2025-05-07T16:32:07.738+0000][3403][trace][gc,age      ] GC(930) - age  12:      16736 bytes,   39782032 total
[2025-05-07T16:32:07.738+0000][3403][trace][gc,age      ] GC(930) - age  13:      57448 bytes,   39839480 total
[2025-05-07T16:32:07.738+0000][3403][trace][gc,age      ] GC(930) - age  14:      31320 bytes,   39870800 total
[2025-05-07T16:32:07.738+0000][3403][trace][gc,age      ] GC(930) - age  15:      19992 bytes,   39890792 total
[2025-05-07T16:32:07.738+0000][3403][info ][gc,heap     ] GC(930) Eden regions: 269->0(375)
[2025-05-07T16:32:07.738+0000][3403][info ][gc,heap     ] GC(930) Survivor regions: 1->3(75)
[2025-05-07T16:32:07.738+0000][3403][info ][gc,heap     ] GC(930) Old regions: 313->313
[2025-05-07T16:32:07.738+0000][3403][info ][gc,heap     ] GC(930) Humongous regions: 31->31
[2025-05-07T16:32:07.738+0000][3403][info ][gc,metaspace] GC(930) Metaspace: 202749K(213824K)->202749K(213824K) NonClass: 178110K(185856K)->178110K(185856K) Class: 24638K(27968K)->24638K(27968K)
[2025-05-07T16:32:07.738+0000][3403][info ][gc          ] GC(930) Pause Young (Concurrent Start) (G1 Humongous Allocation) 9808M->5545M(19952M) 490.233ms
[2025-05-07T16:32:07.738+0000][3403][info ][gc,cpu      ] GC(930) User=1.03s Sys=0.05s Real=0.49s
[2025-05-07T16:32:07.738+0000][3403][info ][gc          ] GC(931) Concurrent Mark Cycle
[2025-05-07T16:32:07.739+0000][3403][info ][gc,marking  ] GC(931) Concurrent Scan Root Regions
[2025-05-07T16:32:07.739+0000][3403][info ][safepoint   ] Safepoint "G1TryInitiateConcMark", Time since last: 90443409180 ns, Reaching safepoint: 19268332 ns, At safepoint: 490571717 ns, Total: 509840049 ns
[2025-05-07T16:32:08.011+0000][3403][info ][gc,marking  ] GC(931) Concurrent Scan Root Regions 272.090ms
[2025-05-07T16:32:08.011+0000][3403][info ][gc,marking  ] GC(931) Concurrent Mark
[2025-05-07T16:32:08.011+0000][3403][info ][gc,marking  ] GC(931) Concurrent Mark From Roots
[2025-05-07T16:32:08.011+0000][3403][info ][gc,task     ] GC(931) Using 2 workers of 2 for marking
[2025-05-07T16:32:25.731+0000][3403][info ][gc,marking  ] GC(931) Concurrent Mark From Roots 17720.362ms
[2025-05-07T16:32:25.731+0000][3403][info ][gc,marking  ] GC(931) Concurrent Preclean
[2025-05-07T16:32:25.736+0000][3403][info ][gc,marking  ] GC(931) Concurrent Preclean 4.797ms
[2025-05-07T16:32:25.745+0000][3403][info ][gc,start    ] GC(931) Pause Remark
[2025-05-07T16:33:01.458+0000][3403][info ][gc,mmu      ] GC(931) MMU target violated: 201.0ms (200.0ms/201.0ms)
[2025-05-07T16:33:01.458+0000][3403][info ][gc          ] GC(931) Pause Remark 6987M->6987M(19952M) 35713.260ms
[2025-05-07T16:33:01.459+0000][3403][info ][gc,cpu      ] GC(931) User=0.45s Sys=1.82s Real=35.72s
[2025-05-07T16:33:01.463+0000][3403][info ][safepoint   ] Safepoint "G1PauseRemark", Time since last: 17997933203 ns, Reaching safepoint: 8671856 ns, At safepoint: 35717464339 ns, Total: 35726136195 ns
[2025-05-07T16:33:01.466+0000][3403][info ][gc,marking  ] GC(931) Concurrent Mark 53454.557ms
[2025-05-07T16:33:01.466+0000][3403][info ][gc,marking  ] GC(931) Concurrent Rebuild Remembered Sets and Scrub Regions
[2025-05-07T16:33:01.871+0000][3403][info ][gc,start    ] GC(932) Pause Young (Normal) (G1 Evacuation Pause)
[2025-05-07T16:33:01.871+0000][3403][info ][gc,task     ] GC(932) Using 8 workers of 8 for evacuation
[2025-05-07T16:33:01.872+0000][3403][debug][gc,age      ] GC(932) Desired survivor size 100663296 bytes, new threshold 15 (max threshold 15)
[2025-05-07T16:33:01.900+0000][3403][info ][gc,phases   ] GC(932)   Pre Evacuate Collection Set: 3.64ms
[2025-05-07T16:33:01.900+0000][3403][info ][gc,phases   ] GC(932)   Merge Heap Roots: 0.97ms
[2025-05-07T16:33:01.900+0000][3403][info ][gc,phases   ] GC(932)   Evacuate Collection Set: 19.89ms
[2025-05-07T16:33:01.901+0000][3403][info ][gc,phases   ] GC(932)   Post Evacuate Collection Set: 3.82ms
[2025-05-07T16:33:01.901+0000][3403][info ][gc,phases   ] GC(932)   Other: 0.27ms
[2025-05-07T16:33:01.901+0000][3403][trace][gc,age      ] GC(932) Age table:
[2025-05-07T16:33:01.901+0000][3403][trace][gc,age      ] GC(932) - age   1:   10223192 bytes,   10223192 total
[2025-05-07T16:33:01.901+0000][3403][trace][gc,age      ] GC(932) - age   2:    3808328 bytes,   14031520 total
[2025-05-07T16:33:01.901+0000][3403][trace][gc,age      ] GC(932) - age   3:    1101456 bytes,   15132976 total
[2025-05-07T16:33:01.901+0000][3403][trace][gc,age      ] GC(932) - age   4:     578168 bytes,   15711144 total
[2025-05-07T16:33:01.901+0000][3403][trace][gc,age      ] GC(932) - age   5:      75936 bytes,   15787080 total
[2025-05-07T16:33:01.901+0000][3403][trace][gc,age      ] GC(932) - age   6:      81912 bytes,   15868992 total
[2025-05-07T16:33:01.901+0000][3403][trace][gc,age      ] GC(932) - age   7:     326200 bytes,   16195192 total
[2025-05-07T16:33:01.901+0000][3403][trace][gc,age      ] GC(932) - age   8:      86264 bytes,   16281456 total
[2025-05-07T16:33:01.901+0000][3403][trace][gc,age      ] GC(932) - age   9:      37408 bytes,   16318864 total
[2025-05-07T16:33:01.901+0000][3403][trace][gc,age      ] GC(932) - age  10:      48336 bytes,   16367200 total
[2025-05-07T16:33:01.901+0000][3403][trace][gc,age      ] GC(932) - age  11:      31128 bytes,   16398328 total
[2025-05-07T16:33:01.901+0000][3403][trace][gc,age      ] GC(932) - age  12:      27528 bytes,   16425856 total
[2025-05-07T16:33:01.901+0000][3403][trace][gc,age      ] GC(932) - age  13:      16736 bytes,   16442592 total
[2025-05-07T16:33:01.901+0000][3403][trace][gc,age      ] GC(932) - age  14:      57448 bytes,   16500040 total
[2025-05-07T16:33:01.901+0000][3403][trace][gc,age      ] GC(932) - age  15:      31320 bytes,   16531360 total
[2025-05-07T16:33:01.901+0000][3403][info ][gc,heap     ] GC(932) Eden regions: 89->0(649)
[2025-05-07T16:33:01.901+0000][3403][info ][gc,heap     ] GC(932) Survivor regions: 3->2(12)
[2025-05-07T16:33:01.901+0000][3403][info ][gc,heap     ] GC(932) Old regions: 313->313
[2025-05-07T16:33:01.901+0000][3403][info ][gc,heap     ] GC(932) Humongous regions: 33->31
[2025-05-07T16:33:01.901+0000][3403][info ][gc,metaspace] GC(932) Metaspace: 202696K(213824K)->202696K(213824K) NonClass: 178073K(185856K)->178073K(185856K) Class: 24623K(27968K)->24623K(27968K)
[2025-05-07T16:33:01.901+0000][3403][info ][gc          ] GC(932) Pause Young (Normal) (G1 Evacuation Pause) 7001M->5520M(19952M) 29.892ms
[2025-05-07T16:33:01.901+0000][3403][info ][gc,cpu      ] GC(932) User=0.17s Sys=0.00s Real=0.03s
[2025-05-07T16:33:01.902+0000][3403][info ][safepoint   ] Safepoint "G1CollectForAllocation", Time since last: 288269829 ns, Reaching safepoint: 120372904 ns, At safepoint: 30719003 ns, Total: 151091907 ns

DavidTurner · May 7, 2025, 8:21pm

twilson:

[2025-05-07T16:33:01.458+0000][3403][info ][gc          ] GC(931) Pause Remark 6987M->6987M(19952M) 35713.260ms
[2025-05-07T16:33:01.459+0000][3403][info ][gc,cpu      ] GC(931) User=0.45s Sys=1.82s Real=35.72s
[2025-05-07T16:33:01.463+0000][3403][info ][safepoint   ] Safepoint "G1PauseRemark", Time since last: 17997933203 ns, Reaching safepoint: 8671856 ns, At safepoint: 35717464339 ns, Total: 35726136195 ns

That'd do it, but this is very weird, the User=0.45s Sys=1.82s bits indicate that the GC didn't actually do very much work during this pause, and yet Real=35.72s indicates it took a lot of wall-clock time. Was much of the heap swapped out to disk perhaps? You must disable swap.

twilson · May 7, 2025, 8:37pm

The swap space was not disabled. I've disabled it and kicked off another snapshot. It takes about 4 hours until this issue comes up so I'll check on things in the morning.

twilson · May 8, 2025, 1:24pm

Looks like disabling the swap has fixed this issue. Been running for years with the swap enabled and never had a problem so this didn't even cross my mind.

Thanks for the help on this.

DavidTurner · May 8, 2025, 2:07pm

Great. I don't know of anything that might have changed in 8.17.2 that might relate to this, but disabling swap has been important for avoiding GC pauses for years.

Topic		Replies	Views
When one node goes down, memory usage jumps several gigabytes on other nodes Elasticsearch	7	568	July 6, 2017
Can't stop a snapshot running on my cluster Elasticsearch	10	4610	July 6, 2017
Suspect GC sync'ed between nodes cause simultaneous performance hit Elasticsearch	14	543	July 6, 2017
Inconsistent search cluster status and search results after long GC run Elasticsearch	5	799	July 6, 2017
Elasticsearch dies every other day Elasticsearch	15	1645	July 6, 2017

S3 snapshot causing nodes to drop from cluster

Related topics