I'm facing a few times data nodes crash under moving data index from data warm tier to data cold tier.
My memory config on data warm tier is 8GB which is implies 4GB of heap size due to Elasticsearch auto config
but on data cold tier I have 4GB then we have 2GB of heap size
How Can I avoid such issue? What config is suggested?
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1 | {"@timestamp":"2022-07-23T06:18:10.970Z", "log.level": "WARN", "message":"[gc][2300398] overhead, spent [2.3s] collecting in the last [2.4s]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es_data_hdd_7_1][scheduler][T#1]","log.logger":"org.elasticsearch.monitor.jvm.JvmGcMonitorService","elasticsearch.cluster.uuid":"XDEw48F5SEu3KcS3_jDNcw","elasticsearch.node.id":"lRwdvvfdS1iYAErk17YcyQ","elasticsearch.node.name":"es_data_hdd_7_1","elasticsearch.cluster.name":"elk_cluster"}
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1 | java.lang.OutOfMemoryError: Java heap space
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1 | Dumping heap to data/java_pid7.hprof ...
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1 | {"@timestamp":"2022-07-23T06:18:19.799Z", "log.level": "WARN", "message":"execution of [ReschedulingRunnable{runnable=org.elasticsearch.watcher.ResourceWatcherService$ResourceMonitor@18d5ffb8, interval=1m}] took [8078ms] which is above the warn threshold of [5000ms]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es_data_hdd_7_1][scheduler][T#1]","log.logger":"org.elasticsearch.threadpool.ThreadPool","elasticsearch.cluster.uuid":"XDEw48F5SEu3KcS3_jDNcw","elasticsearch.node.id":"lRwdvvfdS1iYAErk17YcyQ","elasticsearch.node.name":"es_data_hdd_7_1","elasticsearch.cluster.name":"elk_cluster"}
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1 | {"@timestamp":"2022-07-23T06:18:19.800Z", "log.level": "WARN", "message":"[gc][2300399] overhead, spent [8.7s] collecting in the last [8.8s]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es_data_hdd_7_1][scheduler][T#1]","log.logger":"org.elasticsearch.monitor.jvm.JvmGcMonitorService","elasticsearch.cluster.uuid":"XDEw48F5SEu3KcS3_jDNcw","elasticsearch.node.id":"lRwdvvfdS1iYAErk17YcyQ","elasticsearch.node.name":"es_data_hdd_7_1","elasticsearch.cluster.name":"elk_cluster"}
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1 | {"@timestamp":"2022-07-23T06:18:19.800Z", "log.level": "WARN", "message":"Unexpected exception from an event executor: ", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es_data_hdd_7_1][transport_worker][T#58]","log.logger":"io.netty.util.concurrent.SingleThreadEventExecutor","elasticsearch.cluster.uuid":"XDEw48F5SEu3KcS3_jDNcw","elasticsearch.node.id":"lRwdvvfdS1iYAErk17YcyQ","elasticsearch.node.name":"es_data_hdd_7_1","elasticsearch.cluster.name":"elk_cluster","error.type":"java.lang.OutOfMemoryError","error.message":"Java heap space","error.stack_trace":"java.lang.OutOfMemoryError: Java heap space\n\tat java.base/java.lang.Integer.valueOf(Integer.java:1081)\n\tat java.base/sun.nio.ch.EPollSelectorImpl.processEvents(EPollSelectorImpl.java:192)\n\tat java.base/sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:135)\n\tat java.base/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:129)\n\tat java.base/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:146)\n\tat io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:813)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Thread.java:833)\n"}
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1 | {"@timestamp":"2022-07-23T06:18:32.940Z", "log.level": "WARN", "message":"timer thread slept for [13.1s/13144ms] on absolute clock which is above the warn threshold of [5000ms]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es_data_hdd_7_1][[timer]]","log.logger":"org.elasticsearch.threadpool.ThreadPool","elasticsearch.cluster.uuid":"XDEw48F5SEu3KcS3_jDNcw","elasticsearch.node.id":"lRwdvvfdS1iYAErk17YcyQ","elasticsearch.node.name":"es_data_hdd_7_1","elasticsearch.cluster.name":"elk_cluster"}
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1 | {"@timestamp":"2022-07-23T06:18:32.941Z", "log.level": "WARN", "message":"timer thread slept for [13.1s/13144341782ns] on relative clock which is above the warn threshold of [5000ms]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es_data_hdd_7_1][[timer]]","log.logger":"org.elasticsearch.threadpool.ThreadPool","elasticsearch.cluster.uuid":"XDEw48F5SEu3KcS3_jDNcw","elasticsearch.node.id":"lRwdvvfdS1iYAErk17YcyQ","elasticsearch.node.name":"es_data_hdd_7_1","elasticsearch.cluster.name":"elk_cluster"}
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1 | Heap dump file created [2919329118 bytes in 20.309 secs]
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1 | Terminating due to java.lang.OutOfMemoryError: Java heap space
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1 | 2022-07-23 06:18:33,273156 UTC [696] INFO Main.cc@112 Parent process died - ML controller exiting