Hello,
We have noticed this. After the GC is doing the Pause Full (G1 Evacuation Pause). The memory is released but the node can not get back in the cluster without a ES Service restart. THE Server replied on the 9200 port but the ES part was not replying. Attached the GC logs.
The cluster had 4 nodes. Each node has 504 shards.
Running on ES 7.13.2, Ubuntu v20, JVM 16, default jdk
-Xms30g
-Xmx30g
14-:-XX:+UseG1GC
14-:-XX:G1ReservePercent=25
14-:-XX:InitiatingHeapOccupancyPercent=30
The node has 64gb ram, SSD.
[2021-07-08T12:03:31.793+0000][1086][gc,start ] GC(141124) Pause Full (G1 Evacuation Pause)
[2021-07-08T12:03:31.802+0000][1086][gc,phases,start] GC(141124) Phase 1: Mark live objects
[2021-07-08T12:03:31.996+0000][1086][gc,phases ] GC(141124) Phase 1: Mark live objects 193.883ms
[2021-07-08T12:03:31.996+0000][1086][gc,phases,start] GC(141124) Phase 2: Prepare for compaction
[2021-07-08T12:03:32.067+0000][1086][gc,phases ] GC(141124) Phase 2: Prepare for compaction 71.152ms
[2021-07-08T12:03:32.067+0000][1086][gc,phases,start] GC(141124) Phase 3: Adjust pointers
[2021-07-08T12:03:32.168+0000][1086][gc,phases ] GC(141124) Phase 3: Adjust pointers 101.137ms
[2021-07-08T12:03:32.168+0000][1086][gc,phases,start] GC(141124) Phase 4: Compact heap
[2021-07-08T12:03:32.662+0000][1086][gc,phases ] GC(141124) Phase 4: Compact heap 493.110ms
[2021-07-08T12:03:32.705+0000][1086][gc,heap ] GC(141124) Eden regions: 0->0(693)
[2021-07-08T12:03:32.705+0000][1086][gc,heap ] GC(141124) Survivor regions: 0->0(0)
[2021-07-08T12:03:32.705+0000][1086][gc,heap ] GC(141124) Old regions: 1860->682
[2021-07-08T12:03:32.705+0000][1086][gc,heap ] GC(141124) Archive regions: 2->2
[2021-07-08T12:03:32.705+0000][1086][gc,heap ] GC(141124) Humongous regions: 58->37
[2021-07-08T12:03:32.705+0000][1086][gc,metaspace ] GC(141124) Metaspace: 124071K(125568K)->124068K(125568K) NonClass: 108738K(109568K)->108735K(109568K) Class: 15333K(16000K)->15332K(16000K)
[2021-07-08T12:03:32.705+0000][1086][gc ] GC(141124) Pause Full (G1 Evacuation Pause) 30392M->11255M(30720M) 912.185ms
[2021-07-08T12:03:32.705+0000][1086][gc,cpu ] GC(141124) User=17.70s Sys=0.06s Real=0.93s
[2021-07-08T12:03:32.705+0000][1086][safepoint ] Safepoint "G1CollectForAllocation", Time since last: 1801792 ns, Reaching safepoint: 421106 ns, At safepoint: 945002207 ns, Total: 945423313 ns
[2021-07-08T12:03:32.705+0000][1086][gc,marking ] GC(141120) Concurrent Rebuild Remembered Sets 1786.856ms
[2021-07-08T12:03:32.705+0000][1086][gc,marking ] GC(141120) Concurrent Mark Abort
[2021-07-08T12:03:32.705+0000][1086][gc ] GC(141120) Concurrent Mark Cycle 3248.726ms
[2021-07-08T12:03:32.749+0000][1086][safepoint ] Safepoint "ICBufferFull", Time since last: 43352922 ns, Reaching safepoint: 744754 ns, At safepoint: 17056 ns, Total: 761810 ns
[2021-07-08T12:03:33.317+0000][1086][safepoint ] Safepoint "ICBufferFull", Time since last: 567216659 ns, Reaching safepoint: 283036 ns, At safepoint: 16358 ns, Total: 299394 ns
[2021-07-08T12:03:33.317+0000][1086][safepoint ] Safepoint "ICBufferFull", Time since last: 167361 ns, Reaching safepoint: 144589 ns, At safepoint: 27664 ns, Total: 172253 ns
[2021-07-08T12:03:33.317+0000][1086][safepoint ] Safepoint "ICBufferFull", Time since last: 51763 ns, Reaching safepoint: 109378 ns, At safepoint: 13468 ns, Total: 122846 ns
[2021-07-08T12:03:33.317+0000][1086][safepoint ] Safepoint "ICBufferFull", Time since last: 40818 ns, Reaching safepoint: 187398 ns, At safepoint: 14352 ns, Total: 201750 ns
[2021-07-08T12:03:33.375+0000][1086][gc,heap,exit ] Heap
[2021-07-08T12:03:33.375+0000][1086][gc,heap,exit ] garbage-first heap total 31457280K, used 12279144K [0x0000000080000000, 0x0000000800000000)
[2021-07-08T12:03:33.375+0000][1086][gc,heap,exit ] region size 16384K, 47 young (770048K), 0 survivors (0K)
[2021-07-08T12:03:33.375+0000][1086][gc,heap,exit ] Metaspace used 124164K, committed 125632K, reserved 1163264K
[2021-07-08T12:03:33.375+0000][1086][gc,heap,exit ] class space used 15349K, committed 16000K, reserved 1048576K