Is there anything I can do to help, like collecting more detailed information?
It seems reproducible on my environment. I'm doing intensive time series bulk index
and occasional delete
of expired indices and create
new ones once a threshold is reached. Also every 10 min or so I do query for a couple of health stats
. And when the issue happened, there was no GUI query, so it was most likely triggered by the stats queries.
Here some CPU/RAM stats a few seconds after the error first occurred:
CPU/RAM Check
us:100.8 ni:0 sy:11.2 id:673.3 wa:6.2 hi:0 si:0.6 st:0.8
elasticsearch:0.3%/20833M
node-es-app:0.7%/280M
node-gui-app:0%/136M
node-kafka-consumer-app:0.3%/222M
GC Log
The first error timestamp is Dec 25 2020 14:36:29.945
which is 2020-12-25T22:36:29.945+0000
[2020-12-25T22:35:27.802+0000][17629][gc,task ] GC(5572) Using 8 workers of 8 for evacuation
[2020-12-25T22:35:27.802+0000][17629][gc,age ] GC(5572) Desired survivor size 645922816 bytes, new threshold 15 (max threshold 15)
[2020-12-25T22:35:27.845+0000][17629][gc,age ] GC(5572) Age table with threshold 15 (max threshold 15)
[2020-12-25T22:35:27.845+0000][17629][gc,age ] GC(5572) - age 1: 96866720 bytes, 96866720 total
[2020-12-25T22:35:27.845+0000][17629][gc,age ] GC(5572) - age 2: 875688 bytes, 97742408 total
[2020-12-25T22:35:27.845+0000][17629][gc,age ] GC(5572) - age 3: 1148584 bytes, 98890992 total
[2020-12-25T22:35:27.845+0000][17629][gc,age ] GC(5572) - age 4: 1228376 bytes, 100119368 total
[2020-12-25T22:35:27.845+0000][17629][gc,age ] GC(5572) - age 5: 671064 bytes, 100790432 total
[2020-12-25T22:35:27.845+0000][17629][gc,age ] GC(5572) - age 6: 1317520 bytes, 102107952 total
[2020-12-25T22:35:27.845+0000][17629][gc,age ] GC(5572) - age 7: 2431152 bytes, 104539104 total
[2020-12-25T22:35:27.845+0000][17629][gc,age ] GC(5572) - age 8: 1062832 bytes, 105601936 total
[2020-12-25T22:35:27.845+0000][17629][gc,age ] GC(5572) - age 9: 1093856 bytes, 106695792 total
[2020-12-25T22:35:27.845+0000][17629][gc,age ] GC(5572) - age 10: 931104 bytes, 107626896 total
[2020-12-25T22:35:27.845+0000][17629][gc,age ] GC(5572) - age 11: 960808 bytes, 108587704 total
[2020-12-25T22:35:27.845+0000][17629][gc,age ] GC(5572) - age 12: 1379720 bytes, 109967424 total
[2020-12-25T22:35:27.845+0000][17629][gc,age ] GC(5572) - age 13: 1897664 bytes, 111865088 total
[2020-12-25T22:35:27.845+0000][17629][gc,age ] GC(5572) - age 14: 1075400 bytes, 112940488 total
[2020-12-25T22:35:27.845+0000][17629][gc,age ] GC(5572) - age 15: 1100864 bytes, 114041352 total
[2020-12-25T22:35:27.845+0000][17629][gc,phases ] GC(5572) Pre Evacuate Collection Set: 0.5ms
[2020-12-25T22:35:27.845+0000][17629][gc,phases ] GC(5572) Merge Heap Roots: 0.4ms
[2020-12-25T22:35:27.845+0000][17629][gc,phases ] GC(5572) Evacuate Collection Set: 34.4ms
[2020-12-25T22:35:27.845+0000][17629][gc,phases ] GC(5572) Post Evacuate Collection Set: 7.1ms
[2020-12-25T22:35:27.845+0000][17629][gc,phases ] GC(5572) Other: 0.5ms
[2020-12-25T22:35:27.845+0000][17629][gc,heap ] GC(5572) Eden regions: 1217->0(1213)
[2020-12-25T22:35:27.845+0000][17629][gc,heap ] GC(5572) Survivor regions: 11->15(154)
[2020-12-25T22:35:27.845+0000][17629][gc,heap ] GC(5572) Old regions: 120->121
[2020-12-25T22:35:27.845+0000][17629][gc,heap ] GC(5572) Archive regions: 2->2
[2020-12-25T22:35:27.845+0000][17629][gc,heap ] GC(5572) Humongous regions: 8->8
[2020-12-25T22:35:27.845+0000][17629][gc,metaspace] GC(5572) Metaspace: 81817K(84304K)->81817K(84304K) NonClass: 72150K(73936K)->72150K(73936K) Class: 9667K(10368K)->9667K(10368K)
[2020-12-25T22:35:27.845+0000][17629][gc ] GC(5572) Pause Young (Normal) (G1 Evacuation Pause) 10855M->1147M(16384M) 43.178ms
[2020-12-25T22:35:27.845+0000][17629][gc,cpu ] GC(5572) User=0.09s Sys=0.00s Real=0.05s
[2020-12-25T22:35:27.845+0000][17629][safepoint ] Safepoint "G1CollectForAllocation", Time since last: 187299620267 ns, Reaching safepoint: 368638 ns, At safepoint: 43323743 ns, Total: 43692381 ns
[2020-12-25T22:36:29.858+0000][17629][safepoint ] Safepoint "Cleanup", Time since last: 62012215944 ns, Reaching safepoint: 181759 ns, At safepoint: 15241 ns, Total: 197000 ns
[2020-12-25T22:36:30.858+0000][17629][safepoint ] Safepoint "Cleanup", Time since last: 1000309566 ns, Reaching safepoint: 179210 ns, At safepoint: 10574 ns, Total: 189784 ns
[2020-12-25T22:36:53.862+0000][17629][safepoint ] Safepoint "Cleanup", Time since last: 23003486317 ns, Reaching safepoint: 199146 ns, At safepoint: 9706 ns, Total: 208852 ns
[2020-12-25T22:37:04.864+0000][17629][safepoint ] Safepoint "Cleanup", Time since last: 11001632988 ns, Reaching safepoint: 329561 ns, At safepoint: 22201 ns, Total: 351762 ns
[2020-12-25T22:38:28.878+0000][17629][safepoint ] Safepoint "Cleanup", Time since last: 84013263412 ns, Reaching safepoint: 352420 ns, At safepoint: 15756 ns, Total: 368176 ns