Out of mem on during moving data from warm to cold tier

I'm facing a few times data nodes crash under moving data index from data warm tier to data cold tier.
My memory config on data warm tier is 8GB which is implies 4GB of heap size due to Elasticsearch auto config
but on data cold tier I have 4GB then we have 2GB of heap size

How Can I avoid such issue? What config is suggested?

elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1    | {"@timestamp":"2022-07-23T06:18:10.970Z", "log.level": "WARN", "message":"[gc][2300398] overhead, spent [2.3s] collecting in the last [2.4s]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es_data_hdd_7_1][scheduler][T#1]","log.logger":"org.elasticsearch.monitor.jvm.JvmGcMonitorService","elasticsearch.cluster.uuid":"XDEw48F5SEu3KcS3_jDNcw","elasticsearch.node.id":"lRwdvvfdS1iYAErk17YcyQ","elasticsearch.node.name":"es_data_hdd_7_1","elasticsearch.cluster.name":"elk_cluster"}
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1    | java.lang.OutOfMemoryError: Java heap space
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1    | Dumping heap to data/java_pid7.hprof ...
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1    | {"@timestamp":"2022-07-23T06:18:19.799Z", "log.level": "WARN", "message":"execution of [ReschedulingRunnable{runnable=org.elasticsearch.watcher.ResourceWatcherService$ResourceMonitor@18d5ffb8, interval=1m}] took [8078ms] which is above the warn threshold of [5000ms]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es_data_hdd_7_1][scheduler][T#1]","log.logger":"org.elasticsearch.threadpool.ThreadPool","elasticsearch.cluster.uuid":"XDEw48F5SEu3KcS3_jDNcw","elasticsearch.node.id":"lRwdvvfdS1iYAErk17YcyQ","elasticsearch.node.name":"es_data_hdd_7_1","elasticsearch.cluster.name":"elk_cluster"}
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1    | {"@timestamp":"2022-07-23T06:18:19.800Z", "log.level": "WARN", "message":"[gc][2300399] overhead, spent [8.7s] collecting in the last [8.8s]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es_data_hdd_7_1][scheduler][T#1]","log.logger":"org.elasticsearch.monitor.jvm.JvmGcMonitorService","elasticsearch.cluster.uuid":"XDEw48F5SEu3KcS3_jDNcw","elasticsearch.node.id":"lRwdvvfdS1iYAErk17YcyQ","elasticsearch.node.name":"es_data_hdd_7_1","elasticsearch.cluster.name":"elk_cluster"}
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1    | {"@timestamp":"2022-07-23T06:18:19.800Z", "log.level": "WARN", "message":"Unexpected exception from an event executor: ", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es_data_hdd_7_1][transport_worker][T#58]","log.logger":"io.netty.util.concurrent.SingleThreadEventExecutor","elasticsearch.cluster.uuid":"XDEw48F5SEu3KcS3_jDNcw","elasticsearch.node.id":"lRwdvvfdS1iYAErk17YcyQ","elasticsearch.node.name":"es_data_hdd_7_1","elasticsearch.cluster.name":"elk_cluster","error.type":"java.lang.OutOfMemoryError","error.message":"Java heap space","error.stack_trace":"java.lang.OutOfMemoryError: Java heap space\n\tat java.base/java.lang.Integer.valueOf(Integer.java:1081)\n\tat java.base/sun.nio.ch.EPollSelectorImpl.processEvents(EPollSelectorImpl.java:192)\n\tat java.base/sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:135)\n\tat java.base/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:129)\n\tat java.base/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:146)\n\tat io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:813)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Thread.java:833)\n"}
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1    | {"@timestamp":"2022-07-23T06:18:32.940Z", "log.level": "WARN", "message":"timer thread slept for [13.1s/13144ms] on absolute clock which is above the warn threshold of [5000ms]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es_data_hdd_7_1][[timer]]","log.logger":"org.elasticsearch.threadpool.ThreadPool","elasticsearch.cluster.uuid":"XDEw48F5SEu3KcS3_jDNcw","elasticsearch.node.id":"lRwdvvfdS1iYAErk17YcyQ","elasticsearch.node.name":"es_data_hdd_7_1","elasticsearch.cluster.name":"elk_cluster"}
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1    | {"@timestamp":"2022-07-23T06:18:32.941Z", "log.level": "WARN", "message":"timer thread slept for [13.1s/13144341782ns] on relative clock which is above the warn threshold of [5000ms]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es_data_hdd_7_1][[timer]]","log.logger":"org.elasticsearch.threadpool.ThreadPool","elasticsearch.cluster.uuid":"XDEw48F5SEu3KcS3_jDNcw","elasticsearch.node.id":"lRwdvvfdS1iYAErk17YcyQ","elasticsearch.node.name":"es_data_hdd_7_1","elasticsearch.cluster.name":"elk_cluster"}
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1    | Heap dump file created [2919329118 bytes in 20.309 secs]
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1    | Terminating due to java.lang.OutOfMemoryError: Java heap space
elk_cluster_es_data_hdd_7_1.0.jo0n454y4a1t@prod_server1    | 2022-07-23 06:18:33,273156 UTC [696] INFO  Main.cc@112 Parent process died - ML controller exiting

Which nodes crash? Which version of Elasticsearch are you using?

I have a ELK cluster on 3 physical hosts
so all of nodes with data cold tier were crashed
I'm using elasticsearch:8.1.0
Today I've increased the memory from 4GB to 8GB so You can see heap.max on these nodes 4GB

name id node.role heap.current heap.percent heap.max
es_data_hdd_3_2 BCoX w 1.3gb 34 4gb
es_data_hdd_4_2 EUum w 1.9gb 49 4gb
es_master_1_1 mEM4 m 1.4gb 18 8gb
es_master_1_2 H2xf m 3.4gb 43 8gb
es_data_ssd_5_3 WqnN hs 5.2gb 65 8gb
es_data_hdd_2_3 36x- w 2gb 52 4gb
es_data_hdd_6_3 zOxf w 3.2gb 82 4gb
es_data_hdd_8_1 AD1I c 1.7gb 42 4gb
es_data_ssd_1_2 1YxZ hs 5.4gb 67 8gb
es_data_hdd_8_2 MHyq c 1.4gb 35 4gb
es_data_ssd_3_1 hNM_ hs 3.9gb 49 8gb
es_data_ssd_4_1 DsIv hs 5.3gb 67 8gb
es_data_ssd_1_3 azCW hs 4.5gb 56 8gb
es_data_hdd_7_2 W79A c 2gb 52 4gb
es_data_hdd_7_1 lRwd c 2.9gb 74 4gb
es_data_hdd_3_1 OAUc w 1.7gb 42 4gb
es_data_hdd_3_3 HU8a w 2.3gb 58 4gb
es_data_hdd_1_2 Mh42 w 2.8gb 70 4gb
es_data_ssd_2_1 p42W hs 5gb 63 8gb
es_data_hdd_6_1 mnVH w 3.2gb 81 4gb
es_data_hdd_4_1 FdQI w 2.6gb 66 4gb
es_data_ssd_5_1 XMyL hs 4.2gb 53 8gb
es_data_hdd_2_2 nZ1t w 2.4gb 61 4gb
es_data_hdd_1_3 -GXd w 1.7gb 42 4gb
es_data_hdd_9_2 ZNlx c 2.7gb 69 4gb
es_data_ssd_1_1 fI-r hs 4.2gb 53 8gb
es_data_hdd_5_1 MZ-C w 2.9gb 74 4gb
es_master_1_3 K7Bz m 3.7gb 46 8gb
es_data_ssd_5_2 ifoM hs 5.6gb 70 8gb
es_data_hdd_8_3 atMw c 1.3gb 33 4gb
es_data_ssd_4_3 eRk6 hs 3.8gb 47 8gb
es_data_ssd_3_3_ingest plQ2 i 1011.7mb 24 4gb
es_data_hdd_9_1 5L7Y c 1.6gb 41 4gb
es_data_ssd_3_2 fjWT hs 4gb 50 8gb
es_master_2_3 3fnz m 1.5gb 18 8gb
es_data_hdd_5_2 xOzZ w 2.4gb 62 4gb
es_data_ssd_4_2 NPcm hs 4.7gb 59 8gb
es_data_hdd_6_2 HT4p w 1.9gb 48 4gb
es_data_hdd_5_3 4S0G w 1.5gb 38 4gb
es_data_ssd_3_2_ingest vKag i 1gb 27 4gb
es_master_2_2 0kgj m 2.4gb 30 8gb
es_data_hdd_9_3 glsU c 2.3gb 59 4gb
es_master_2_1 RBkT m 1.9gb 23 8gb
es_data_ssd_3_3 8RZ_ hs 4.1gb 52 8gb
es_data_hdd_7_3 zPzz c 1.6gb 41 4gb
es_data_ssd_2_3 YEzE hs 5.3gb 66 8gb
es_data_ssd_2_2 wgCi hs 4.7gb 59 8gb
es_data_hdd_2_1 GJE6 w 1.1gb 28 4gb
es_data_hdd_4_3 rygO w 3.2gb 81 4gb
es_data_hdd_1_1 DyMO w 3gb 77 4gb
es_data_ssd_3_1_ingest u4dz i 2.1gb 53 4gb

What does your ILM policy look like? How much data does the cold tier nodes hold?

Request for 'prod_data_policy'

This Elasticsearch request will create or update this index lifecycle policy.

PUT _ilm/policy/prod_data_policy
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "set_priority": {
            "priority": 100
      "warm": {
        "min_age": "30d",
        "actions": {
          "set_priority": {
            "priority": 50
      "cold": {
        "min_age": "60d",
        "actions": {
          "set_priority": {
            "priority": 0
node            shards disk.indices disk.used disk.avail disk.total disk.percent

es_data_hdd_8_1    236       84.4gb      90gb        2tb      2.1tb            4

es_data_hdd_8_3    236       83.8gb    89.4gb        2tb      2.1tb            4

es_data_hdd_8_2    236       94.7gb   100.3gb        2tb      2.1tb            4

es_data_hdd_7_2    236       94.4gb     100gb        2tb      2.1tb            4

es_data_hdd_7_3    236         87gb    89.8gb        2tb      2.1tb            4

es_data_hdd_9_1    236       71.2gb    76.8gb      2.1tb      2.1tb            3

es_data_hdd_9_2    236       87.3gb    92.8gb        2tb      2.1tb            4

es_data_hdd_7_1    236       84.3gb    89.9gb        2tb      2.1tb            4

es_data_hdd_9_3    237       46.4gb    49.2gb      2.1tb      2.1tb            2

I would recommend you upgrade to Elasticsearch 8.3 as this has a lot of improvements with respect to heap usage. At the moment it looks like you have a lot of very small shards, which is inefficient. Please have a look at the guidance in this blog post and note the changes in recommendations made for version 8.3 onwards.

With respect to your ILM policy it seems like you are using time-based indices without rollover and that you simply move these between tiers without making any changes to the indices along the way. Given this I would expect indices in all tiers to have the same level of heap usage/overhead, and the amount of disk space allocated to the cold tier compared to the allocated heap looks excessive. I suspect your cold nodes with the current config would only be able to hold about the same as the warm nodes in relation to their heap.

When deploying a hot-warm-cold architecture you generally try to optimize memory usage of your indices ince they get to the warm tier by forcemerging down to 1 segment and making them read-only. If your indices have more than one primary shard you may also want to shrink them down to a single primary shard as your indices appear quite small. This will allow the cold tier to hold a lot more data than the warm tier, where this optimisation may not yet have taken place.

You may also want to forcemerge the indices on the cold tier down to a single segment and check index and node stats to see the effect on heap usage. Be aware that this can be I/O intensive, so do this slowly over time.

Perfect, many thanks for clear input:)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.