Elasticsearh failing at 3am

The last 2 nights at ~3am elasticsearch stops and kibana hangs. the message in /var/log/messages is
Unable to retrieve version information from Elasticsearch nodes

Any help would be greatly appreciated.

What is in the Elasticsearch logs around that time?

[2020-09-23T06:55:51.714+0000][1482][safepoint ] Safepoint "G1Concurrent", Time since last: 646779250 ns, Reaching safepoint: 329420 ns, At safepoint: 2211553 ns, Total: 2540973 ns
[2020-09-23T06:55:51.715+0000][1482][gc,marking ] GC(50123) Concurrent Cleanup for Next Mark
[2020-09-23T06:55:51.723+0000][1482][gc,marking ] GC(50123) Concurrent Cleanup for Next Mark 8.157ms
[2020-09-23T06:55:51.723+0000][1482][gc ] GC(50123) Concurrent Cycle 1721.989ms
[2020-09-23T06:56:10.415+0000][1482][safepoint ] Safepoint "BulkRevokeBias", Time since last: 18698528072 ns, Reaching safepoint: 396304 ns, At safepoint: 1797244 ns, Total: 2193548 ns
[2020-09-23T06:56:17.941+0000][1482][safepoint ] Safepoint "BulkRevokeBias", Time since last: 7523640583 ns, Reaching safepoint: 514421 ns, At safepoint: 1724972 ns, Total: 2239393 ns
[2020-09-23T06:56:38.828+0000][1482][safepoint ] Safepoint "BulkRevokeBias", Time since last: 20885374837 ns, Reaching safepoint: 373970 ns, At safepoint: 1515062 ns, Total: 1889032 ns
[2020-09-23T06:57:00.544+0000][1482][safepoint ] Safepoint "BulkRevokeBias", Time since last: 21713047730 ns, Reaching safepoint: 625148 ns, At safepoint: 1688940 ns, Total: 2314088 ns
[2020-09-23T06:57:06.236+0000][1482][safepoint ] Safepoint "Cleanup", Time since last: 5687420298 ns, Reaching safepoint: 4856901 ns, At safepoint: 29255 ns, Total: 4886156 ns
[2020-09-23T06:57:07.248+0000][1482][safepoint ] Safepoint "Cleanup", Time since last: 1011420448 ns, Reaching safepoint: 538328 ns, At safepoint: 6996 ns, Total: 545324 ns
[2020-09-23T06:57:11.097+0000][1482][gc,start ] GC(50124) Pause Young (Normal) (G1 Evacuation Pause)
[2020-09-23T06:57:11.101+0000][1482][gc,task ] GC(50124) Using 8 workers of 8 for evacuation
[2020-09-23T06:57:11.101+0000][1482][gc,age ] GC(50124) Desired survivor size 139460608 bytes, new threshold 15 (max threshold 15)
[2020-09-23T06:57:11.139+0000][1482][gc,age ] GC(50124) Age table with threshold 15 (max threshold 15)
[2020-09-23T06:57:11.139+0000][1482][gc,age ] GC(50124) - age 1: 15383472 bytes, 15383472 total
[2020-09-23T06:57:11.139+0000][1482][gc,age ] GC(50124) - age 2: 637920 bytes, 16021392 total
[2020-09-23T06:57:11.139+0000][1482][gc,age ] GC(50124) - age 3: 891320 bytes, 16912712 total
[2020-09-23T06:57:11.139+0000][1482][gc,age ] GC(50124) - age 4: 305136 bytes, 17217848 total
[2020-09-23T06:57:11.139+0000][1482][gc,age ] GC(50124) - age 5: 187136 bytes, 17404984 total
[2020-09-23T06:57:11.139+0000][1482][gc,age ] GC(50124) - age 6: 296616 bytes, 17701600 total
[2020-09-23T06:57:11.139+0000][1482][gc,age ] GC(50124) - age 7: 150120 bytes, 17851720 total
[2020-09-23T06:57:11.139+0000][1482][gc,age ] GC(50124) - age 8: 570408 bytes, 18422128 total
[2020-09-23T06:57:11.139+0000][1482][gc,age ] GC(50124) - age 9: 4008 bytes, 18426136 total
[2020-09-23T06:57:11.139+0000][1482][gc,age ] GC(50124) - age 10: 134216 bytes, 18560352 total
[2020-09-23T06:57:11.139+0000][1482][gc,age ] GC(50124) - age 11: 184752 bytes, 18745104 total
[2020-09-23T06:57:11.139+0000][1482][gc,age ] GC(50124) - age 12: 150552 bytes, 18895656 total
[2020-09-23T06:57:11.139+0000][1482][gc,age ] GC(50124) - age 13: 867224 bytes, 19762880 total
[2020-09-23T06:57:11.139+0000][1482][gc,age ] GC(50124) - age 14: 133160 bytes, 19896040 total
[2020-09-23T06:57:11.139+0000][1482][gc,age ] GC(50124) - age 15: 332320 bytes, 20228360 total
[2020-09-23T06:57:11.139+0000][1482][gc,phases ] GC(50124) Pre Evacuate Collection Set: 1.9ms
[2020-09-23T06:57:11.139+0000][1482][gc,phases ] GC(50124) Merge Heap Roots: 3.2ms
[2020-09-23T06:57:11.139+0000][1482][gc,phases ] GC(50124) Evacuate Collection Set: 24.5ms
[2020-09-23T06:57:11.139+0000][1482][gc,phases ] GC(50124) Post Evacuate Collection Set: 8.4ms
[2020-09-23T06:57:11.139+0000][1482][gc,phases ] GC(50124) Other: 4.2ms
[2020-09-23T06:57:11.139+0000][1482][gc,heap ] GC(50124) Eden regions: 1047->0(1046)
[2020-09-23T06:57:11.139+0000][1482][gc,heap ] GC(50124) Survivor regions: 11->11(133)
[2020-09-23T06:57:11.139+0000][1482][gc,heap ] GC(50124) Old regions: 683->683
[2020-09-23T06:57:11.139+0000][1482][gc,heap ] GC(50124) Archive regions: 2->2
[2020-09-23T06:57:11.139+0000][1482][gc,heap ] GC(50124) Humongous regions: 153->153
[2020-09-23T06:57:11.139+0000][1482][gc,metaspace ] GC(50124) Metaspace: 120117K(129100K)->120117K(129100K) NonClass: 105856K(111104K)->105856K(111104K) Class: 14261K(17996K)->14261K(17996K)
[2020-09-23T06:57:11.139+0000][1482][gc ] GC(50124) Pause Young (Normal) (G1 Evacuation Pause) 3788M->1694M(5120M) 42.239ms
[2020-09-23T06:57:11.139+0000][1482][gc,cpu ] GC(50124) User=0.07s Sys=0.01s Real=0.04s
[2020-09-23T06:57:11.139+0000][1482][safepoint ] Safepoint "G1CollectForAllocation", Time since last: 3848750935 ns, Reaching safepoint: 317449 ns, At safepoint: 42385954 ns, Total: 42703403 ns
[2020-09-23T06:57:47.907+0000][1482][safepoint ] Safepoint "Cleanup", Time since last: 36766879094 ns, Reaching safepoint: 433248 ns, At safepoint: 6596 ns, Total: 439844 ns
[2020-09-23T06:57:48.910+0000][1482][safepoint ] Safepoint "Cleanup", Time since last: 1002688670 ns, Reaching safepoint: 483798 ns, At safepoint: 11707 ns, Total: 495505 ns
[2020-09-23T06:57:52.924+0000][1482][safepoint ] Safepoint "Cleanup", Time since last: 4013802994 ns, Reaching safepoint: 409705 ns, At safepoint: 9072 ns, Total: 418777 ns

And this is from this mornings /var/log/messages when it crashed

Sep 24 03:12:42 ST-WAZUH kibana[2801]: {"type":"response","@timestamp":"2020-09-24T07:12:42Z","tags":,"pid":2801,"method":"get","statusCode":302,"req":{"url":"/","method":"get","headers":{"user-agent":"check_http/v2.2.1 (nagios-plugins 2.2.1)","connection":"close","accept":"/"},"remoteAddress":"172.16.75.170","userAgent":"172.16.75.170"},"res":{"statusCode":302,"responseTime":3,"contentLength":9},"message":"GET / 302 3ms - 9.0B"}
Sep 24 03:13:02 ST-WAZUH filebeat[1466]: 2020-09-24T03:13:02.026-0400#011INFO#011[monitoring]#011log/log.go:145#011Non-zero metrics in the last 30s#011{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":278580,"time":{"ms":6336}},"total":{"ticks":1202580,"time":{"ms":7445},"value":1202580},"user":{"ticks":924000,"time":{"ms":1109}}},"handles":{"limit":{"hard":262144,"soft":1024},"open":11},"info":{"ephemeral_id":"dd1bb2e4-bf27-4954-803b-7944901ca109","uptime":{"ms":38341510}},"memstats":{"gc_next":19016672,"memory_alloc":13933272,"memory_total":109574463744,"rss":106496},"runtime":{"goroutines":28}},"filebeat":{"events":{"active":58,"added":382,"done":324},"harvester":{"files":{"79d8e458-e676-4e17-9fed-6e19c1e6a92d":{"last_event_published_time":"2020-09-24T03:13:01.997Z","last_event_timestamp":"2020-09-24T03:13:01.991Z","read_offset":1508909,"size":2244221}},"open_files":1,"running":1}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":324,"active":23,"batches":17,"total":347}},"pipeline":{"clients":1,"events":{"active":23,"published":347,"total":347},"queue":{"acked":324}}},"registrar":{"states":{"current":1,"update":324},"writes":{"success":16,"total":16}},"system":{"load":{"1":9.17,"15":4.4,"5":5.27,"norm":{"1":1.1463,"15":0.55,"5":0.6587}}}}}}
Sep 24 03:13:34 ST-WAZUH filebeat[1466]: 2020-09-24T03:13:32.630-0400#011INFO#011[monitoring]#011log/log.go:145#011Non-zero metrics in the last 30s#011{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":286530,"time":{"ms":7947}},"total":{"ticks":1211180,"time":{"ms":8594},"value":1211180},"user":{"ticks":924650,"time":{"ms":647}}},"handles":{"limit":{"hard":262144,"soft":1024},"open":12},"info":{"ephemeral_id":"dd1bb2e4-bf27-4954-803b-7944901ca109","uptime":{"ms":38371527}},"memstats":{"gc_next":19579680,"memory_alloc":15043024,"memory_total":109583425408,"rss":12288},"runtime":{"goroutines":27}},"filebeat":{"events":{"active":96,"added":205,"done":109},"harvester":{"files":{"79d8e458-e676-4e17-9fed-6e19c1e6a92d":{"last_event_published_time":"2020-09-24T03:13:31.963Z","last_event_timestamp":"2020-09-24T03:13:31.963Z","read_offset":835542,"size":901369}},"open_files":1,"running":1}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":109,"active":27,"batches":6,"total":136}},"pipeline":{"clients":1,"events":{"active":133,"published":219,"total":219},"queue":{"acked":109}}},"registrar":{"states":{"current":1,"update":159},"writes":{"success":6,"total":7}},"system":{"load":{"1":12.95,"15":4.86,"5":6.55,"norm":{"1":1.6188,"15":0.6075,"5":0.8188}}}}}}
Sep 24 03:13:55 ST-WAZUH kibana[2801]: {"type":"log","@timestamp":"2020-09-24T07:13:55Z","tags":["error","savedobjects-service"],"pid":2801,"message":"Unable to retrieve version information from Elasticsearch nodes."}
Sep 24 03:13:55 ST-WAZUH kibana[2801]: {"type":"log","@timestamp":"2020-09-24T07:13:55Z","tags":["status","plugin:xpack_main@7.8.1","error"],"pid":2801,"state":"red","message":"Status changed from green to red - Unable to retrieve version information from Elasticsearch nodes.","prevState":"green","prevMsg":"Ready"}
Sep 24 03:13:55 ST-WAZUH kibana[2801]: {"type":"log","@timestamp":"2020-09-24T07:13:55Z","tags":["status","plugin:reporting@7.8.1","error"],"pid":2801,"state":"red","message":"Status changed from green to red - Unable to retrieve version information from Elasticsearch nodes.","prevState":"green","prevMsg":"Ready"}
Sep 24 03:13:55 ST-WAZUH kibana[2801]: {"type":"log","@timestamp":"2020-09-24T07:13:55Z","tags":["status","plugin:spaces@7.8.1","error"],"pid":2801,"state":"red","message":"Status changed from green to red - Unable to retrieve version information from Elasticsearch nodes.","prevState":"green","prevMsg":"Ready"}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.