High sustained read traffic from Elasticsearch hot pods

cjpowers · March 18, 2026, 3:44pm

I’m running an EFK stack in an RKE2 cluster on RHEL 8.5. The VMs are hosted in vSphere and use centralized NetApp storage.
Our Elasticsearch footprint currently looks like this:

2 master nodes
2 client nodes
2 hot data pods

We monitor pod throughput in Grafana, and we’ve been noticing that after the hot pods have been running for a while, the network eventually becomes saturated with what appears to be traffic to the NetApp storage.
What’s odd is that this doesn’t happen immediately. Sometimes it starts after about a day, other times it may take a week. But once it starts, the hot pods generate sustained read traffic for hours or even days. Before it starts its around 10-150MB/s
What we’ve observed so far:

The traffic appears to be mostly read traffic, not write traffic.
The duration seems related to max_primary_shard_size.
The smaller the shard size, the longer the sustained activity seems to continue.

Current config:

eck-apps:
  enabled: true
  elasticsearch:
    externalLoggingEnabled: false
    master:
      replicas: 2
      storage: 6Gi
      memory: 6Gi
    dataHot:
      replicas: 1
      storage: 50Gi
      memory: 8Gi
      cpu: 1500m
    client:
      replicas: 2
      storage: 6Gi
      memory: 2Gi
    clientExternal:
      replicas: 2
      memory: 4Gi
  logging:
    number_of_shards: 2
    number_of_replicas: 0
    rollover: 
      max_age: "1d"
      max_primary_shard_size: "20GB"
    shrink:
      number_of_shards: 1
    delete:
      min_age: "1d"

My questions are:

Has anyone else seen this kind of behavior?
Is there a known trigger for this kind of long-running sustained read activity?
Does this sound like it could be tied to ILM rollover/shrink, segment merges, or shard relocation?
Note: This is still happening if I run it with only 1 Hot pod and 0 logging replicas.
Why would the traffic be so heavily read-oriented, especially against backend storage?

Any insight would be appreciated.

RainTown · March 19, 2026, 5:33pm

Welcome to the forum.

I am a bit surprised with no response. I have zero familiarity with RKE2 clusters, but Netapp storage is not common here, as elasticsearch is well known to perform best with locally attached disks (as fast as you can get) storage, as compared to networked storage.

The read dominance suggests perhaps a lot of segment merging.

In passing, only 2 master-eligible nodes is not a great idea, if that is what you have.

You may wish to share outputs (DevTools) from

GET _cat/nodes?v&h=name,role,version,master,u,cpu,disk.used,disk.avail,disk.total

GET _cat/indices?v&s=index

GET _nodes/stats/indices/merges

GET _cat/segments?v

particularly when the cluster gets into that network-is-being-saturated mode

cjpowers · March 20, 2026, 3:59pm

Thanks for the advice, Kevin, I took the master’s up to 3, and forced a rollover, I’ll capture those GETs when it spikes again. Thanks again.

RainTown · March 24, 2026, 8:27pm

any update?

Some of the GETs are useful at all times, and others are useful in part "naked", but also with a "before spike" snapshot to compare with.

Just gives general information about your current cluster's nodes.

A list of current indices.

Some stats of merges ...

...and segments.

cjpowers · March 24, 2026, 9:26pm

Hi Kevin, Yes, its currently happening, I also increased the hot pods to 3:

GET _cat/nodes?v&h=name,role,version,master,u,cpu,disk.used,disk.avail,disk.total:

name                          role version master     u cpu disk.used disk.avail disk.total
monitoring-apps-es-master-2   mr   8.19.7  -       4.2d   5   100.8gb     88.3gb    189.1gb
monitoring-apps-es-master-1   mr   8.19.7  -        11d   3   118.8gb     70.3gb    189.1gb
monitoring-apps-es-data-hot-0 dr   8.19.7  -        11d   7   100.8gb     88.3gb    189.1gb
monitoring-apps-es-client-1   ir   8.19.7  -      11.1d  88    57.9gb    131.2gb    189.1gb
monitoring-apps-es-master-0   mr   8.19.7  *      11.1d  88    57.9gb    131.2gb    189.1gb
monitoring-apps-es-data-hot-1 dr   8.19.7  -       4.2d   3   118.8gb     70.3gb    189.1gb
monitoring-apps-es-client-0   ir   8.19.7  -        11d   6    92.4gb     96.7gb    189.1gb
monitoring-apps-es-data-hot-2 dr   8.19.7  -         1d   8    92.4gb     96.7gb    189.1gb

GET _cat/indices?v&s=index:

health status index                                                              uuid                   pri rep docs.count docs.deleted store.size pri.store.size dataset.size
green  open   .ds-logs-2026.03.24-000034                                         yLBo4lKXQpyP1fDbsBvNfQ   2   1  184650361            0     57.3gb         30.9gb       30.9gb
green  open   .internal.alerts-dataset.quality.alerts-default-000001             bH17cS25Sla2zz56bUp8tg   1   1          0            0       498b           249b         249b
green  open   .internal.alerts-default.alerts-default-000001                     kBpShT_oRLWWLk0Tr70wTQ   1   1          0            0       498b           249b         249b
green  open   .internal.alerts-ml.anomaly-detection-health.alerts-default-000001 iTg80HBsR-i59uVB9ya80Q   1   1          0            0       498b           249b         249b
green  open   .internal.alerts-ml.anomaly-detection.alerts-default-000001        QLVAV-0bRDSRAXMeGfJ8aw   1   1          0            0       498b           249b         249b
green  open   .internal.alerts-observability.apm.alerts-default-000001           _deFnzO0STKjvTWyugtV2A   1   1          0            0       498b           249b         249b
green  open   .internal.alerts-observability.logs.alerts-default-000001          PjJx944LTGW31A_Mde_Kog   1   1          0            0       498b           249b         249b
green  open   .internal.alerts-observability.metrics.alerts-default-000001       TZL8bCKxT9yx8x_kWxawqg   1   1          0            0       498b           249b         249b
green  open   .internal.alerts-observability.slo.alerts-default-000001           hdp15iRQSPm1rvD4hYrl6g   1   1          0            0       498b           249b         249b
green  open   .internal.alerts-observability.threshold.alerts-default-000001     _bXEGMQOReW670Atysb8PQ   1   1          0            0       498b           249b         249b
green  open   .internal.alerts-observability.uptime.alerts-default-000001        gxLx2xm2T5Kbb8QQ0VbPxw   1   1          0            0       498b           249b         249b
green  open   .internal.alerts-security.alerts-default-000001                    aUHUXrEzR_K8VbCE93ssbA   1   1          0            0       498b           249b         249b
green  open   .internal.alerts-security.attack.discovery.alerts-default-000001   fHyPcx8KRzKd6zeYx3vDmA   1   1          0            0       498b           249b         249b
green  open   .internal.alerts-stack.alerts-default-000001                       Pi78A93_SYSszqgeflcCLQ   1   1          0            0       498b           249b         249b
green  open   .internal.alerts-streams.alerts-default-000001                     iLZlsJf7SmCaO48RHAGIOA   1   1          0            0       498b           249b         249b
green  open   .internal.alerts-transform.health.alerts-default-000001            Q7WL4V1yRI21B4DdBSTQUA   1   1          0            0       498b           249b         249b
green  open   shrink-mjim-.ds-logs-2026.03.23-000031                             R-TM9rT9RnSb9o_N9PDiTg   1   1  224771551            0     76.6gb         38.2gb       38.2gb

GET _nodes/stats/indices/merge

{
  "_nodes": {
    "total": 8,
    "successful": 8,
    "failed": 0
  },
  "cluster_name": "monitoring-apps",
  "nodes": {
    "84184n_yQPibCTZBesmROA": {
      "timestamp": 1774387471030,
      "name": "monitoring-apps-es-data-hot-1",
      "transport_address": "10.42.2.9:9300",
      "host": "10.42.2.9",
      "ip": "10.42.2.9:9300",
      "roles": [
        "data",
        "remote_cluster_client"
      ],
      "attributes": {
        "ml.config_version": "12.0.0",
        "transform.config_version": "10.0.0",
        "k8s_node_name": "dsk8cpdmzd3.dmzqa.local",
        "box_type": "hot",
        "xpack.installed": "true"
      },
      "indices": {
        "merges": {
          "current": 0,
          "current_docs": 0,
          "current_size_in_bytes": 0,
          "total": 8244,
          "total_time_in_millis": 69013468,
          "total_docs": 2060716273,
          "total_size_in_bytes": 407077421332,
          "total_stopped_time_in_millis": 0,
          "total_throttled_time_in_millis": 57912419,
          "total_auto_throttle_in_bytes": 356515840
        }
      }
    },
    "bYH966e8SVKxMCWzfsxJig": {
      "timestamp": 1774387471030,
      "name": "monitoring-apps-es-master-1",
      "transport_address": "10.42.2.224:9300",
      "host": "10.42.2.224",
      "ip": "10.42.2.224:9300",
      "roles": [
        "master",
        "remote_cluster_client"
      ],
      "attributes": {
        "transform.config_version": "10.0.0",
        "xpack.installed": "true",
        "k8s_node_name": "dsk8cpdmzd3.dmzqa.local",
        "ml.config_version": "12.0.0"
      },
      "indices": {
        "merges": {
          "current": 0,
          "current_docs": 0,
          "current_size_in_bytes": 0,
          "total": 0,
          "total_time_in_millis": 0,
          "total_docs": 0,
          "total_size_in_bytes": 0,
          "total_stopped_time_in_millis": 0,
          "total_throttled_time_in_millis": 0,
          "total_auto_throttle_in_bytes": 0
        }
      }
    },
    "rrHuXzVfTo6Z0KfOmZS2tA": {
      "timestamp": 1774387471031,
      "name": "monitoring-apps-es-client-0",
      "transport_address": "10.42.1.240:9300",
      "host": "10.42.1.240",
      "ip": "10.42.1.240:9300",
      "roles": [
        "ingest",
        "remote_cluster_client"
      ],
      "attributes": {
        "transform.config_version": "10.0.0",
        "xpack.installed": "true",
        "k8s_node_name": "dsk8cpdmzd2.dmzqa.local",
        "ml.config_version": "12.0.0"
      },
      "indices": {
        "merges": {
          "current": 0,
          "current_docs": 0,
          "current_size_in_bytes": 0,
          "total": 0,
          "total_time_in_millis": 0,
          "total_docs": 0,
          "total_size_in_bytes": 0,
          "total_stopped_time_in_millis": 0,
          "total_throttled_time_in_millis": 0,
          "total_auto_throttle_in_bytes": 0
        }
      }
    },
    "5qO9yl6sR_qLtoM0IDnd4A": {
      "timestamp": 1774387471033,
      "name": "monitoring-apps-es-data-hot-2",
      "transport_address": "10.42.1.26:9300",
      "host": "10.42.1.26",
      "ip": "10.42.1.26:9300",
      "roles": [
        "data",
        "remote_cluster_client"
      ],
      "attributes": {
        "ml.config_version": "12.0.0",
        "transform.config_version": "10.0.0",
        "k8s_node_name": "dsk8cpdmzd2.dmzqa.local",
        "box_type": "hot",
        "xpack.installed": "true"
      },
      "indices": {
        "merges": {
          "current": 1,
          "current_docs": 8790852,
          "current_size_in_bytes": 2019741130,
          "total": 1661,
          "total_time_in_millis": 13002362,
          "total_docs": 359177991,
          "total_size_in_bytes": 78221161851,
          "total_stopped_time_in_millis": 0,
          "total_throttled_time_in_millis": 10936279,
          "total_auto_throttle_in_bytes": 199229440
        }
      }
    },
    "p0LSSj7QRiK0MKAk8Ai7EA": {
      "timestamp": 1774387471031,
      "name": "monitoring-apps-es-master-0",
      "transport_address": "10.42.0.87:9300",
      "host": "10.42.0.87",
      "ip": "10.42.0.87:9300",
      "roles": [
        "master",
        "remote_cluster_client"
      ],
      "attributes": {
        "xpack.installed": "true",
        "transform.config_version": "10.0.0",
        "k8s_node_name": "dsk8cpdmzd1.dmzqa.local",
        "ml.config_version": "12.0.0"
      },
      "indices": {
        "merges": {
          "current": 0,
          "current_docs": 0,
          "current_size_in_bytes": 0,
          "total": 0,
          "total_time_in_millis": 0,
          "total_docs": 0,
          "total_size_in_bytes": 0,
          "total_stopped_time_in_millis": 0,
          "total_throttled_time_in_millis": 0,
          "total_auto_throttle_in_bytes": 0
        }
      }
    },
    "i4MWTxqtQaiSJlsCcuw1eA": {
      "timestamp": 1774387471034,
      "name": "monitoring-apps-es-data-hot-0",
      "transport_address": "10.42.3.199:9300",
      "host": "10.42.3.199",
      "ip": "10.42.3.199:9300",
      "roles": [
        "data",
        "remote_cluster_client"
      ],
      "attributes": {
        "ml.config_version": "12.0.0",
        "transform.config_version": "10.0.0",
        "k8s_node_name": "dsk8wkdmzd1.dmzqa.local",
        "box_type": "hot",
        "xpack.installed": "true"
      },
      "indices": {
        "merges": {
          "current": 1,
          "current_docs": 7311608,
          "current_size_in_bytes": 1963075239,
          "total": 27126,
          "total_time_in_millis": 187830289,
          "total_docs": 4132581929,
          "total_size_in_bytes": 1014277036893,
          "total_stopped_time_in_millis": 0,
          "total_throttled_time_in_millis": 140358108,
          "total_auto_throttle_in_bytes": 513802240
        }
      }
    },
    "uB8YS-vuTJ6tb8ONCi5prQ": {
      "timestamp": 1774387471036,
      "name": "monitoring-apps-es-client-1",
      "transport_address": "10.42.0.85:9300",
      "host": "10.42.0.85",
      "ip": "10.42.0.85:9300",
      "roles": [
        "ingest",
        "remote_cluster_client"
      ],
      "attributes": {
        "xpack.installed": "true",
        "transform.config_version": "10.0.0",
        "k8s_node_name": "dsk8cpdmzd1.dmzqa.local",
        "ml.config_version": "12.0.0"
      },
      "indices": {
        "merges": {
          "current": 0,
          "current_docs": 0,
          "current_size_in_bytes": 0,
          "total": 0,
          "total_time_in_millis": 0,
          "total_docs": 0,
          "total_size_in_bytes": 0,
          "total_stopped_time_in_millis": 0,
          "total_throttled_time_in_millis": 0,
          "total_auto_throttle_in_bytes": 0
        }
      }
    },
    "kL7rMobiRX6lg_ovXXBS5g": {
      "timestamp": 1774387471037,
      "name": "monitoring-apps-es-master-2",
      "transport_address": "10.42.3.221:9300",
      "host": "10.42.3.221",
      "ip": "10.42.3.221:9300",
      "roles": [
        "master",
        "remote_cluster_client"
      ],
      "attributes": {
        "xpack.installed": "true",
        "transform.config_version": "10.0.0",
        "k8s_node_name": "dsk8wkdmzd1.dmzqa.local",
        "ml.config_version": "12.0.0"
      },
      "indices": {
        "merges": {
          "current": 0,
          "current_docs": 0,
          "current_size_in_bytes": 0,
          "total": 0,
          "total_time_in_millis": 0,
          "total_docs": 0,
          "total_size_in_bytes": 0,
          "total_stopped_time_in_millis": 0,
          "total_throttled_time_in_millis": 0,
          "total_auto_throttle_in_bytes": 0
        }
      }
    }
  }
}

GET _cat/segments?v:
This has 400 lines and won't fit here but here is an excerpt

index                                  shard prirep ip          segment generation docs.count docs.deleted    size size.memory committed searchable version compound
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1s             64      76424            0  14.3mb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1t             65         21            0  99.2kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1u             66         26            0  99.1kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1v             67       3218            0 708.9kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1w             68       4656            0   974kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1x             69       4746            0     1mb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1y             70       2903            0 642.4kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1z             71        281            0 150.1kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _20             72       2760            0 611.7kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _21             73          8            0  77.4kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _22             74       4984            0     1mb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _23             75       4849            0     1mb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _24             76       3036            0 554.3kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _25             77         12            0  88.4kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _26             78       4599            0   985kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _27             79        114            0 130.8kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _28             80       3121            0   690kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _29             81       4896            0     1mb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _2a             82       3256            0 762.9kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _2b             83       2676            0 605.2kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _2c             84       3177            0 677.8kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _2d             85       3122            0 724.5kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _2e             86          7            0  55.8kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _2f             87       3002            0 645.3kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _2g             88       6454            0   1.2mb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _2h             89       4607            0 974.5kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _2i             90         31            0  94.2kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _2j             91       3196            0 791.7kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _2k             92       4552            0 971.2kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _x4           1192   61670196            0   8.1gb           0 true      false      9.12.2  false
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _13e          1418   10075435            0   1.3gb           0 true      false      9.12.2  false
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1xh          2501   19501137            0   3.3gb           0 true      false      9.12.2  false
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1yc          2532     370947            0  90.8mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1z8          2564       8611            0   3.5mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1z9          2565         31            0  92.7kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1za          2566     290185            0  67.2mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zb          2567      17310            0   4.5mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zc          2568       8673            0   2.8mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zd          2569      11026            0   3.5mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1ze          2570      16313            0   4.9mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zf          2571      12441            0     4mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zg          2572      36544            0  10.2mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zh          2573      44718            0  11.8mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zi          2574       7208            0   1.9mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zj          2575         61            0 188.7kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zk          2576          3            0  42.4kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zl          2577      40148            0   9.9mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zm          2578      28801            0   7.7mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zn          2579      28390            0   7.7mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zo          2580         24            0 102.6kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zp          2581          8            0    54kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zq          2582      11099            0   3.5mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zr          2583      15003            0   4.1mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zs          2584       5880            0   2.3mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zt          2585         90            0  93.8kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zu          2586       9159            0   3.2mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zv          2587      10043            0   3.3mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zw          2588       6476            0   2.3mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zx          2589         15            0  96.7kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zy          2590       4430            0   2.4mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _1zz          2591       9476            0   3.3mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _200          2592      16663            0   4.7mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _201          2593         38            0 113.5kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _202          2594      14867            0   4.1mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _203          2595       8362            0   2.8mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _204          2596       4862            0   2.4mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _205          2597        272            0 218.8kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _206          2598         80            0  59.5kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _207          2599          5            0  55.1kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _208          2600       9760            0   3.3mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _209          2601      10591            0   3.3mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _20a          2602       1907            0     1mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _20b          2603       4386            0   1.5mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _20c          2604          2            0    62kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _20d          2605       9667            0   3.3mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _20e          2606       8022            0     3mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _20f          2607       8723            0   3.3mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _20g          2608        175            0  84.5kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _20h          2609       4703            0   2.4mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _20i          2610       5946            0   2.7mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     r      10.42.2.9   _20j          2611       9504            0   2.6mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2o             96       4334            0 952.5kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2p             97       1563            0 396.6kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2q             98       1442            0   434kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2r             99        591            0 125.4kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2s            100        287            0 186.1kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2t            101       3081            0 689.8kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2u            102       3119            0   674kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2v            103       1275            0 288.8kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2w            104      84345            0  15.7mb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2x            105        520            0 211.2kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2y            106        858            0 265.5kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2z            107       3070            0 749.9kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _30            108       1225            0 321.6kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _31            109       2254            0 486.2kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _32            110       3999            0 909.8kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _33            111       3380            0 774.6kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _34            112       2227            0 524.4kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _35            113       1793            0 455.2kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _36            114        135            0  58.7kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _37            115       5032            0     1mb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _38            116       3137            0 747.9kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _39            117       1381            0 326.2kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _3a            118       2254            0 496.1kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _3b            119       3123            0   681kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _3c            120       2893            0 636.2kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _3d            121        298            0 175.3kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _3e            122         44            0  47.7kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _3f            123       1052            0 285.3kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _3g            124        407            0 136.4kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _3h            125       2114            0 551.6kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _3i            126       2227            0 525.1kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _3j            127        825            0 166.1kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _3k            128       1691            0 391.9kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _3l            129        979            0 193.8kb           0 false     true       9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _y7           1231   46858626            0   6.4gb           0 true      false      9.12.2  false
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _1aq          1682   16345481            0   2.2gb           0 true      false      9.12.2  false
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _1g6          1878    5909609            0 872.9mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2a4          2956   15792137            0   2.8gb           0 true      false      9.12.2  false
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2cp          3049     433761            0 116.1mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2dm          3082     468858            0  97.9mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2el          3117     316784            0  59.2mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2fh          3149     335821            0  71.4mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2gb          3179     314625            0  70.2mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2hc          3216     290072            0  63.4mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2i7          3247     404077            0  91.4mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2j1          3277     249144            0  58.9mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2k3          3315     302219            0  68.6mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2l2          3350     248629            0  57.4mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2lv          3379     244012            0  60.7mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2mt          3413     222199            0  57.6mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2nn          3443     200489            0  54.8mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2on          3479     186161            0  47.1mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2ph          3509     293782            0  74.4mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2qg          3544     178858            0  46.9mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2rd          3577     241201            0  62.5mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2s9          3609     222896            0    59mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2t7          3643     163417            0  44.9mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2u5          3677     178874            0  49.8mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2v2          3710     185186            0  54.1mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2w2          3746     178821            0  52.2mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2wy          3778     177395            0  51.4mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2xp          3805     148422            0  44.5mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2yp          3841     133579            0  39.9mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _2zp          3877     135598            0  46.6mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _30j          3907     171564            0  54.8mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _31h          3941      95142            0  31.2mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _32c          3972     171204            0  51.8mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _33g          4012     120980            0  36.8mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _348          4040     118336            0  37.9mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _351          4069     179502            0  57.3mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _35v          4099       1484            0 776.1kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _35y          4102         35            0   110kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _35z          4103         17            0  62.8kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _360          4104      78643            0    28mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _361          4105       4823            0   2.2mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _362          4106       4341            0   2.1mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _363          4107       4594            0   2.1mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _364          4108       3797            0   1.7mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _365          4109       2136            0   1.1mb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _366          4110       1382            0 849.1kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _367          4111        576            0 529.5kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _368          4112        203            0 235.5kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _369          4113        135            0 172.1kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _36a          4114         13            0  70.1kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _36b          4115        125            0 130.4kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _36c          4116        144            0 140.7kb           0 true      false      9.12.2  true
.ds-logs-2026.03.24-000034             0     p      10.42.3.199 _36d          4117         53            0  60.7kb           0 true      false      9.12.2  true

RainTown · March 24, 2026, 11:39pm

thanks for output. The rest of the segments output would help a bit, specifically that for the other main index, shrink-mjim-.ds-logs-2026.03.23-000031 . and also a GET on

_cat/shards?v&bytes=b

But ...

monitoring-apps-es-data-hot-2 dr   8.19.7  -         1d   8    92.4gb     96.7gb    189.1gb

this node is up for one day

    "5qO9yl6sR_qLtoM0IDnd4A": {
      "timestamp": 1774387471033,
      "name": "monitoring-apps-es-data-hot-2",
,,,
      "indices": {
        "merges": {
          "current": 1,
          "current_docs": 8790852,
          "current_size_in_bytes": 2019741130,
          "total": 1661,
          "total_time_in_millis": 13002362,
          "total_docs": 359177991,
          "total_size_in_bytes": 78221161851,
          "total_stopped_time_in_millis": 0,
          "total_throttled_time_in_millis": 10936279,
          "total_auto_throttle_in_bytes": 199229440
        }
      }
    }

"total_throttled_time_in_millis": 10936279

That's over 3 hours already. My hunch is your (network) IO is just not fast enough to keep up once enough segment merges are on the todo list, or at least to keep it continually reading data at high enough volume to explain what you are seeing.

If changing the IO subsystem is not an option, and I repeat again that the recommendation is to use local storage where possible, then you likely need to better tune your ECK setup, where there are better people than me who can advise - specifically I cant quite marry

  logging:
    number_of_shards: 2
    number_of_replicas: 0
    rollover: 
      max_age: "1d"
      max_primary_shard_size: "20GB"
    shrink:
      number_of_shards: 1
    delete:
      min_age: "1d"

with the entirety of what you shared.

Might be helpful for you to explain, in just normal English, what the idea is here with your actual important log data? i.e. Describe in words the desired lifecycle from arrival at ES to it (the log, or the index containing the log) being deleted.

Christian_Dahlqvist · March 25, 2026, 6:32am

The first thing I would do is to remove the shrink step. It does IMHO not really make any sense to have it in there given that you seem to have a very short retention period and it causes additional merging, which seems to be the bottleneck due to slow storage.

cjpowers · March 27, 2026, 3:04pm

This ECK stack is part of a Kubernetes installation from a private company, so its pre-configured (with exposed overrides), they haven't seen this issue with any of their other customers, so we are trying figure out what is different to cause this issue.

We moved the VM to local storage on our servers, and removed shrinks, and it looks like the issue further increased to ~15GB/s.

After the rollover, it calmed back down to ~5GB/s, and it looks like all on one pod at a time, instead of stacking. What I'm still trying to understand is where that data is going. at those speeds its reading hundreds of terabytes per day. Before the trigger, it's a maximum of about ~200MB/s when it peaks.

This is a different cluster that hasn't had the trigger:

additionally after the changes, the "total_throttled_time_in_millis" increased ~8X

from: 10936279
to:   87298520

cjpowers · March 27, 2026, 3:15pm

Full segments:
https://pastes.io/mv4IO0tU

GET _cat/shards?v&bytes=b:

index                                                              shard prirep state           docs       store     dataset ip          node
.slo-observability.summary-v3.5.temp                               0     p      STARTED            0         249         249 10.42.3.199 monitoring-apps-es-data-hot-0
.slo-observability.summary-v3.5.temp                               0     r      STARTED            0         249         249 10.42.1.26  monitoring-apps-es-data-hot-2
.ds-.logs-deprecation.elasticsearch-default-2026.03.13-000001      0     p      STARTED           33      296787      296787 10.42.3.199 monitoring-apps-es-data-hot-0
.ds-.logs-deprecation.elasticsearch-default-2026.03.13-000001      0     r      STARTED           33      296788      296788 10.42.1.26  monitoring-apps-es-data-hot-2
.internal.alerts-observability.logs.alerts-default-000001          0     r      STARTED            0         249         249 10.42.3.199 monitoring-apps-es-data-hot-0
.internal.alerts-observability.logs.alerts-default-000001          0     p      STARTED            0         249         249 10.42.2.9   monitoring-apps-es-data-hot-1
.internal.alerts-observability.uptime.alerts-default-000001        0     p      STARTED            0         249         249 10.42.1.26  monitoring-apps-es-data-hot-2
.internal.alerts-observability.uptime.alerts-default-000001        0     r      STARTED            0         249         249 10.42.2.9   monitoring-apps-es-data-hot-1
.ds-.kibana-event-log-ds-2026.03.20-000002                         0     p      STARTED            0         249         249 10.42.1.26  monitoring-apps-es-data-hot-2
.ds-.kibana-event-log-ds-2026.03.20-000002                         0     r      STARTED            0         249         249 10.42.2.9   monitoring-apps-es-data-hot-1
.internal.alerts-dataset.quality.alerts-default-000001             0     r      STARTED            0         249         249 10.42.1.26  monitoring-apps-es-data-hot-2
.internal.alerts-dataset.quality.alerts-default-000001             0     p      STARTED            0         249         249 10.42.2.9   monitoring-apps-es-data-hot-1
.kibana_locks-000001                                               0     p      STARTED            0         250         250 10.42.3.199 monitoring-apps-es-data-hot-0
.kibana_locks-000001                                               0     r      STARTED            0         250         250 10.42.1.26  monitoring-apps-es-data-hot-2
.kibana_alerting_cases_8.19.7_001                                  0     r      STARTED            1        7129        7129 10.42.1.26  monitoring-apps-es-data-hot-2
.kibana_alerting_cases_8.19.7_001                                  0     p      STARTED            1        7129        7129 10.42.2.9   monitoring-apps-es-data-hot-1
.kibana-siem-rule-migrations-integrations                          0     p      STARTED            0         249         249 10.42.3.199 monitoring-apps-es-data-hot-0
.kibana-siem-rule-migrations-integrations                          0     r      STARTED            0         249         249 10.42.2.9   monitoring-apps-es-data-hot-1
.kibana_ingest_8.19.7_001                                          0     p      STARTED          369     1078643     1078643 10.42.3.199 monitoring-apps-es-data-hot-0
.kibana_ingest_8.19.7_001                                          0     r      STARTED          369     1078643     1078643 10.42.2.9   monitoring-apps-es-data-hot-1
.security-7                                                        0     p      STARTED          265      688924      688924 10.42.3.199 monitoring-apps-es-data-hot-0
.security-7                                                        0     r      STARTED          265      688924      688924 10.42.1.26  monitoring-apps-es-data-hot-2
.apm-source-map                                                    0     p      STARTED            0         249         249 10.42.3.199 monitoring-apps-es-data-hot-0
.apm-source-map                                                    0     r      STARTED            0         249         249 10.42.2.9   monitoring-apps-es-data-hot-1
.slo-observability.sli-v3.5                                        0     p      STARTED            0         249         249 10.42.1.26  monitoring-apps-es-data-hot-2
.slo-observability.sli-v3.5                                        0     r      STARTED            0         249         249 10.42.2.9   monitoring-apps-es-data-hot-1
.inference                                                         0     r      STARTED            3       17627       17627 10.42.1.26  monitoring-apps-es-data-hot-2
.inference                                                         0     p      STARTED            3       17627       17627 10.42.2.9   monitoring-apps-es-data-hot-1
.internal.alerts-streams.alerts-default-000001                     0     r      STARTED            0         249         249 10.42.3.199 monitoring-apps-es-data-hot-0
.internal.alerts-streams.alerts-default-000001                     0     p      STARTED            0         249         249 10.42.1.26  monitoring-apps-es-data-hot-2
.transform-internal-007                                            0     p      STARTED            4       48322       48322 10.42.3.199 monitoring-apps-es-data-hot-0
.transform-internal-007                                            0     r      STARTED            4       48322       48322 10.42.2.9   monitoring-apps-es-data-hot-1
.kibana_task_manager_8.19.7_001                                    0     p      STARTED           42      105793      105793 10.42.3.199 monitoring-apps-es-data-hot-0
.kibana_task_manager_8.19.7_001                                    0     r      STARTED           42      266815      266815 10.42.2.9   monitoring-apps-es-data-hot-1
.secrets-inference                                                 0     p      STARTED            3        9654        9654 10.42.3.199 monitoring-apps-es-data-hot-0
.secrets-inference                                                 0     r      STARTED            3        9654        9654 10.42.1.26  monitoring-apps-es-data-hot-2
.kibana-siem-rule-migrations-prebuiltrules                         0     p      STARTED            0         249         249 10.42.3.199 monitoring-apps-es-data-hot-0
.kibana-siem-rule-migrations-prebuiltrules                         0     r      STARTED            0         249         249 10.42.2.9   monitoring-apps-es-data-hot-1
.ds-.kibana-event-log-ds-2026.03.13-000001                         0     p      STARTED            5       31637       31637 10.42.1.26  monitoring-apps-es-data-hot-2
.ds-.kibana-event-log-ds-2026.03.13-000001                         0     r      STARTED            5       31637       31637 10.42.2.9   monitoring-apps-es-data-hot-1
.kibana_usage_counters_8.19.7_001                                  0     p      STARTED          141       64858       64858 10.42.3.199 monitoring-apps-es-data-hot-0
.kibana_usage_counters_8.19.7_001                                  0     r      STARTED          141      137690      137690 10.42.1.26  monitoring-apps-es-data-hot-2
.ds-logs-2026.03.27-000043                                         0     r      STARTED     72033421 11850451002 11850451002 10.42.3.199 monitoring-apps-es-data-hot-0
.ds-logs-2026.03.27-000043                                         0     p      STARTED     72034876 15338102226 15338102226 10.42.1.26  monitoring-apps-es-data-hot-2
.ds-logs-2026.03.27-000043                                         1     r      STARTED     72021231 11797610377 11797610377 10.42.3.199 monitoring-apps-es-data-hot-0
.ds-logs-2026.03.27-000043                                         1     p      STARTED     72027036 16033181001 16033181001 10.42.2.9   monitoring-apps-es-data-hot-1
.ds-logs-2026.03.27-000043                                         2     r      STARTED     72014215 14876121193 14876121193 10.42.1.26  monitoring-apps-es-data-hot-2
.ds-logs-2026.03.27-000043                                         2     p      STARTED     72019706 12990204630 12990204630 10.42.2.9   monitoring-apps-es-data-hot-1
.ds-logs-2026.03.26-000040                                         0     r      STARTED    115092793 20250835213 20250835213 10.42.1.26  monitoring-apps-es-data-hot-2
.ds-logs-2026.03.26-000040                                         0     p      STARTED    115092793 20250835213 20250835213 10.42.2.9   monitoring-apps-es-data-hot-1
.ds-logs-2026.03.26-000040                                         1     p      STARTED    115098363 20448047806 20448047806 10.42.1.26  monitoring-apps-es-data-hot-2
.ds-logs-2026.03.26-000040                                         1     r      STARTED    115098363 19133282394 19133282394 10.42.2.9   monitoring-apps-es-data-hot-1
.ds-logs-2026.03.26-000040                                         2     p      STARTED    115090223 21494945757 21494945757 10.42.3.199 monitoring-apps-es-data-hot-0
.ds-logs-2026.03.26-000040                                         2     r      STARTED    115090223 19122090258 19122090258 10.42.1.26  monitoring-apps-es-data-hot-2
.internal.alerts-security.attack.discovery.alerts-default-000001   0     p      STARTED            0         249         249 10.42.3.199 monitoring-apps-es-data-hot-0
.internal.alerts-security.attack.discovery.alerts-default-000001   0     r      STARTED            0         249         249 10.42.1.26  monitoring-apps-es-data-hot-2
.internal.alerts-observability.apm.alerts-default-000001           0     p      STARTED            0         249         249 10.42.1.26  monitoring-apps-es-data-hot-2
.internal.alerts-observability.apm.alerts-default-000001           0     r      STARTED            0         249         249 10.42.2.9   monitoring-apps-es-data-hot-1
.internal.alerts-observability.metrics.alerts-default-000001       0     p      STARTED            0         249         249 10.42.3.199 monitoring-apps-es-data-hot-0
.internal.alerts-observability.metrics.alerts-default-000001       0     r      STARTED            0         249         249 10.42.1.26  monitoring-apps-es-data-hot-2
.internal.alerts-transform.health.alerts-default-000001            0     p      STARTED            0         249         249 10.42.3.199 monitoring-apps-es-data-hot-0
.internal.alerts-transform.health.alerts-default-000001            0     r      STARTED            0         249         249 10.42.1.26  monitoring-apps-es-data-hot-2
.kibana_security_session_1                                         0     p      STARTED            3       19068       19068 10.42.3.199 monitoring-apps-es-data-hot-0
.kibana_security_session_1                                         0     r      STARTED            3       19068       19068 10.42.2.9   monitoring-apps-es-data-hot-1
.kibana_analytics_8.19.7_001                                       0     r      STARTED           61     2506436     2506436 10.42.3.199 monitoring-apps-es-data-hot-0
.kibana_analytics_8.19.7_001                                       0     p      STARTED           61     2506436     2506436 10.42.1.26  monitoring-apps-es-data-hot-2
.internal.alerts-stack.alerts-default-000001                       0     p      STARTED            0         249         249 10.42.1.26  monitoring-apps-es-data-hot-2
.internal.alerts-stack.alerts-default-000001                       0     r      STARTED            0         249         249 10.42.2.9   monitoring-apps-es-data-hot-1
.internal.alerts-ml.anomaly-detection.alerts-default-000001        0     p      STARTED            0         249         249 10.42.3.199 monitoring-apps-es-data-hot-0
.internal.alerts-ml.anomaly-detection.alerts-default-000001        0     r      STARTED            0         249         249 10.42.1.26  monitoring-apps-es-data-hot-2
.kibana_8.19.7_001                                                 0     p      STARTED          186      153699      153699 10.42.3.199 monitoring-apps-es-data-hot-0
.kibana_8.19.7_001                                                 0     r      STARTED          186      157528      157528 10.42.1.26  monitoring-apps-es-data-hot-2
.ds-ilm-history-7-2026.03.27-000003                                0     p      STARTED            0         249         249 10.42.3.199 monitoring-apps-es-data-hot-0
.ds-ilm-history-7-2026.03.27-000003                                0     r      STARTED            0         249         249 10.42.1.26  monitoring-apps-es-data-hot-2
.kibana_security_solution_8.19.7_001                               0     p      STARTED         4339    14759437    14759437 10.42.1.26  monitoring-apps-es-data-hot-2
.kibana_security_solution_8.19.7_001                               0     r      STARTED         4339    14759437    14759437 10.42.2.9   monitoring-apps-es-data-hot-1
.apm-custom-link                                                   0     p      STARTED            0         249         249 10.42.3.199 monitoring-apps-es-data-hot-0
.apm-custom-link                                                   0     r      STARTED            0         249         249 10.42.2.9   monitoring-apps-es-data-hot-1
.async-search                                                      0     p      STARTED            0         253         253 10.42.3.199 monitoring-apps-es-data-hot-0
.async-search                                                      0     r      STARTED            0         253         253 10.42.2.9   monitoring-apps-es-data-hot-1
.internal.alerts-observability.threshold.alerts-default-000001     0     p      STARTED            0         249         249 10.42.3.199 monitoring-apps-es-data-hot-0
.internal.alerts-observability.threshold.alerts-default-000001     0     r      STARTED            0         249         249 10.42.2.9   monitoring-apps-es-data-hot-1
.ds-ilm-history-7-2026.03.20-000002                                0     r      STARTED          225      341303      341303 10.42.1.26  monitoring-apps-es-data-hot-2
.ds-ilm-history-7-2026.03.20-000002                                0     p      STARTED          225      341303      341303 10.42.2.9   monitoring-apps-es-data-hot-1
.internal.alerts-default.alerts-default-000001                     0     p      STARTED            0         249         249 10.42.3.199 monitoring-apps-es-data-hot-0
.internal.alerts-default.alerts-default-000001                     0     r      STARTED            0         249         249 10.42.2.9   monitoring-apps-es-data-hot-1
.slo-observability.summary-v3.5                                    0     p      STARTED            0         249         249 10.42.3.199 monitoring-apps-es-data-hot-0
.slo-observability.summary-v3.5                                    0     r      STARTED            0         249         249 10.42.1.26  monitoring-apps-es-data-hot-2
shrink-4hvu-.ds-logs-2026.03.26-000040                             0     p      UNASSIGNED                                               
shrink-4hvu-.ds-logs-2026.03.26-000040                             0     r      UNASSIGNED                                               
.apm-agent-configuration                                           0     p      STARTED            0         249         249 10.42.3.199 monitoring-apps-es-data-hot-0
.apm-agent-configuration                                           0     r      STARTED            0         249         249 10.42.2.9   monitoring-apps-es-data-hot-1
.internal.alerts-ml.anomaly-detection-health.alerts-default-000001 0     p      STARTED            0         249         249 10.42.3.199 monitoring-apps-es-data-hot-0
.internal.alerts-ml.anomaly-detection-health.alerts-default-000001 0     r      STARTED            0         249         249 10.42.2.9   monitoring-apps-es-data-hot-1
.internal.alerts-observability.slo.alerts-default-000001           0     r      STARTED            0         249         249 10.42.1.26  monitoring-apps-es-data-hot-2
.internal.alerts-observability.slo.alerts-default-000001           0     p      STARTED            0         249         249 10.42.2.9   monitoring-apps-es-data-hot-1
.internal.alerts-security.alerts-default-000001                    0     r      STARTED            0         249         249 10.42.1.26  monitoring-apps-es-data-hot-2
.internal.alerts-security.alerts-default-000001                    0     p      STARTED            0         249         249 10.42.2.9   monitoring-apps-es-data-hot-1
.ds-ilm-history-7-2026.03.13-000001                                0     p      STARTED          264      219549      219549 10.42.3.199 monitoring-apps-es-data-hot-0
.ds-ilm-history-7-2026.03.13-000001                                0     r      STARTED          264      219549      219549 10.42.2.9   monitoring-apps-es-data-hot-1
.transform-notifications-000002                                    0     p      STARTED            2        7220        7220 10.42.3.199 monitoring-apps-es-data-hot-0
.transform-notifications-000002                                    0     r      STARTED            2        7220        7220 10.42.2.9   monitoring-apps-es-data-hot-1

RainTown · March 27, 2026, 4:12pm

Yes, this is perplexing.

When it's next going a bit mad, dump the hot_threads. via

GET /_nodes/hot_threads

and any tasks

GET /_tasks

Can you also share your new, updated cluster config, similar to what you shared in first post. I wonder if it's got into some weird "try to balance cluster", chasing its tail, loop? There could also be a rogue query/script trying to do funky stuff you don't know about.

In the shards output, all I see that its semi-interesting are:

index                                                              shard prirep state           docs       store     dataset ip          node
.ds-logs-2026.03.26-000040                                         0     p      STARTED    115092793 20250835213 20250835213 10.42.2.9   monitoring-apps-es-data-hot-1
.ds-logs-2026.03.26-000040                                         1     r      STARTED    115098363 19133282394 19133282394 10.42.2.9   monitoring-apps-es-data-hot-1
.ds-logs-2026.03.27-000043                                         1     p      STARTED     72027036 16033181001 16033181001 10.42.2.9   monitoring-apps-es-data-hot-1
.ds-logs-2026.03.27-000043                                         2     p      STARTED     72019706 12990204630 12990204630 10.42.2.9   monitoring-apps-es-data-hot-1
.ds-logs-2026.03.26-000040                                         0     r      STARTED    115092793 20250835213 20250835213 10.42.1.26  monitoring-apps-es-data-hot-2
.ds-logs-2026.03.26-000040                                         1     p      STARTED    115098363 20448047806 20448047806 10.42.1.26  monitoring-apps-es-data-hot-2
.ds-logs-2026.03.26-000040                                         2     r      STARTED    115090223 19122090258 19122090258 10.42.1.26  monitoring-apps-es-data-hot-2
.ds-logs-2026.03.27-000043                                         0     p      STARTED     72034876 15338102226 15338102226 10.42.1.26  monitoring-apps-es-data-hot-2
.ds-logs-2026.03.27-000043                                         2     r      STARTED     72014215 14876121193 14876121193 10.42.1.26  monitoring-apps-es-data-hot-2
.ds-logs-2026.03.26-000040                                         2     p      STARTED    115090223 21494945757 21494945757 10.42.3.199 monitoring-apps-es-data-hot-0
.ds-logs-2026.03.27-000043                                         0     r      STARTED     72033421 11850451002 11850451002 10.42.3.199 monitoring-apps-es-data-hot-0
.ds-logs-2026.03.27-000043                                         1     r      STARTED     72021231 11797610377 11797610377 10.42.3.199 monitoring-apps-es-data-hot-0

not perfectly balancing the 2x3x2=12 "big" shards (4/4/4) across your 3 hot nodes. But not terribly skewed either (4/5/3)

and

shrink-4hvu-.ds-logs-2026.03.26-000040                             0     p      UNASSIGNED                                               
shrink-4hvu-.ds-logs-2026.03.26-000040                             0     r      UNASSIGNED

is surely a remnant of you changing the shrink config.

There's also the per-index graphs in Kibana, where you can see queries and indexing and other stats per index.

EDIT, also helpful to see the actual policy in use is, after the config has changed:

GET _ilm/policy/your-policy--name

You can do a GET on _ilm/policy and find the appropriate one for the logs data stream.

I really hope someone else has some great ideas too.

cjpowers · March 27, 2026, 4:21pm

eck-apps:
  enabled: true
  elasticsearch:
    externalLoggingEnabled: false
    master:
      replicas: 3
      storage: 6Gi
      memory: 6Gi
    dataHot:
      replicas: 3
      storage: 50Gi
      memory: 8Gi
      cpu: 1500m
    client:
      replicas: 2
      storage: 6Gi
      memory: 2Gi
    clientExternal:
      replicas: 2
      memory: 4Gi
  logging:
    number_of_shards: 3
    number_of_replicas: 1
    rollover: # https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-rollover.html#ilm-rollover-options
      max_age: "1d" # Triggers rollover after the maximum elapsed time from index creation is reached
      max_primary_shard_size: "20GB" # Triggers rollover when the index reaches a certain size
    shrink:
      number_of_shards: 0 #This didn't work, had to turn off in Kibana
    delete: # https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-delete.html#ilm-delete-options
      min_age: "1d"

GET /_nodes/hot_threads:

::: {monitoring-apps-es-data-hot-0}{i4MWTxqtQaiSJlsCcuw1eA}{1CVRJcdrT2e_Kx9zV-QMug}{monitoring-apps-es-data-hot-0}{10.42.3.199}{10.42.3.199:9300}{dr}{8.19.7}{7000099-8536000}{k8s_node_name=dsk8wkdmzd1.dmzqa.local, box_type=hot, xpack.installed=true, ml.config_version=12.0.0, transform.config_version=10.0.0}
   Hot threads at 2026-03-27T16:16:51.192Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {monitoring-apps-es-master-2}{kL7rMobiRX6lg_ovXXBS5g}{CQ9TYlhgTM60URtClSYGKw}{monitoring-apps-es-master-2}{10.42.3.221}{10.42.3.221:9300}{mr}{8.19.7}{7000099-8536000}{ml.config_version=12.0.0, xpack.installed=true, transform.config_version=10.0.0, k8s_node_name=dsk8wkdmzd1.dmzqa.local}
   Hot threads at 2026-03-27T16:16:51.194Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {monitoring-apps-es-client-0}{rrHuXzVfTo6Z0KfOmZS2tA}{48X5oRYjT6C4NR-QZ9ri8w}{monitoring-apps-es-client-0}{10.42.1.240}{10.42.1.240:9300}{ir}{8.19.7}{7000099-8536000}{ml.config_version=12.0.0, transform.config_version=10.0.0, xpack.installed=true, k8s_node_name=dsk8cpdmzd2.dmzqa.local}
   Hot threads at 2026-03-27T16:16:51.197Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   100.0% [cpu=0.0%, other=100.0%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[readiness-service]'
     10/10 snapshots sharing following 7 elements
       java.base@25.0.1/sun.nio.ch.Net.accept(Native Method)
       java.base@25.0.1/sun.nio.ch.ServerSocketChannelImpl.implAccept(ServerSocketChannelImpl.java:424)
       java.base@25.0.1/sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:391)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.readiness.ReadinessService.lambda$startListener$0(ReadinessService.java:178)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.readiness.ReadinessService$$Lambda/0x000000000e3d5620.run(Unknown Source)
       java.base@25.0.1/java.lang.Thread.runWith(Thread.java:1487)
       java.base@25.0.1/java.lang.Thread.run(Thread.java:1474)

::: {monitoring-apps-es-data-hot-2}{5qO9yl6sR_qLtoM0IDnd4A}{me2AAafmTmGFrfRhInSfCw}{monitoring-apps-es-data-hot-2}{10.42.1.26}{10.42.1.26:9300}{dr}{8.19.7}{7000099-8536000}{k8s_node_name=dsk8cpdmzd2.dmzqa.local, box_type=hot, xpack.installed=true, ml.config_version=12.0.0, transform.config_version=10.0.0}
   Hot threads at 2026-03-27T16:16:51.196Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
    0.2% [cpu=0.2%, idle=99.8%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[monitoring-apps-es-data-hot-2][transport_worker][T#15]'
     unique snapshot
       java.base@25.0.1/sun.nio.ch.SocketDispatcher.write0(Native Method)
       java.base@25.0.1/sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:65)
       java.base@25.0.1/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:137)
       java.base@25.0.1/sun.nio.ch.IOUtil.write(IOUtil.java:81)
       java.base@25.0.1/sun.nio.ch.IOUtil.write(IOUtil.java:58)
       java.base@25.0.1/sun.nio.ch.SocketChannelImpl.implWrite(SocketChannelImpl.java:562)
       java.base@25.0.1/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:614)
       org.elasticsearch.transport.netty4@8.19.7/org.elasticsearch.transport.netty4.CopyBytesSocketChannel.writeToSocketChannel(CopyBytesSocketChannel.java:127)
       org.elasticsearch.transport.netty4@8.19.7/org.elasticsearch.transport.netty4.CopyBytesSocketChannel.doWrite(CopyBytesSocketChannel.java:96)
       io.netty.transport@4.1.126.Final/io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:929)
       io.netty.transport@4.1.126.Final/io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:359)
       io.netty.transport@4.1.126.Final/io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:893)
       io.netty.transport@4.1.126.Final/io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1319)
       io.netty.transport@4.1.126.Final/io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:935)
       io.netty.transport@4.1.126.Final/io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:921)
       io.netty.transport@4.1.126.Final/io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:907)
       io.netty.handler@4.1.126.Final/io.netty.handler.ssl.SslHandler.forceFlush(SslHandler.java:2304)
       io.netty.handler@4.1.126.Final/io.netty.handler.ssl.SslHandler.wrapAndFlush(SslHandler.java:831)
       io.netty.handler@4.1.126.Final/io.netty.handler.ssl.SslHandler.flush(SslHandler.java:808)
       io.netty.transport@4.1.126.Final/io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:941)
       io.netty.transport@4.1.126.Final/io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:921)
       io.netty.transport@4.1.126.Final/io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:907)
       org.elasticsearch.transport.netty4@8.19.7/org.elasticsearch.transport.netty4.Netty4WriteThrottlingHandler.doFlush(Netty4WriteThrottlingHandler.java:209)
       org.elasticsearch.transport.netty4@8.19.7/org.elasticsearch.transport.netty4.Netty4WriteThrottlingHandler.flush(Netty4WriteThrottlingHandler.java:150)
       io.netty.transport@4.1.126.Final/io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:937)
       io.netty.transport@4.1.126.Final/io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:957)
       io.netty.transport@4.1.126.Final/io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1263)
       io.netty.common@4.1.126.Final/io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
       io.netty.common@4.1.126.Final/io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
       io.netty.common@4.1.126.Final/io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
       io.netty.transport@4.1.126.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
       io.netty.common@4.1.126.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
       io.netty.common@4.1.126.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
       java.base@25.0.1/java.lang.Thread.runWith(Thread.java:1487)
       java.base@25.0.1/java.lang.Thread.run(Thread.java:1474)

::: {monitoring-apps-es-master-1}{bYH966e8SVKxMCWzfsxJig}{sEi8ESX6SeOWynte0vpnbg}{monitoring-apps-es-master-1}{10.42.2.224}{10.42.2.224:9300}{mr}{8.19.7}{7000099-8536000}{ml.config_version=12.0.0, transform.config_version=10.0.0, xpack.installed=true, k8s_node_name=dsk8cpdmzd3.dmzqa.local}
   Hot threads at 2026-03-27T16:16:51.195Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {monitoring-apps-es-data-hot-1}{84184n_yQPibCTZBesmROA}{gyIDuSXCRo2HwT9xz6UE8A}{monitoring-apps-es-data-hot-1}{10.42.2.9}{10.42.2.9:9300}{dr}{8.19.7}{7000099-8536000}{k8s_node_name=dsk8cpdmzd3.dmzqa.local, box_type=hot, xpack.installed=true, ml.config_version=12.0.0, transform.config_version=10.0.0}
   Hot threads at 2026-03-27T16:16:51.196Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   100.0% [cpu=35.3%, other=64.7%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[monitoring-apps-es-data-hot-1][write][T#18]'
     5/10 snapshots sharing following 30 elements
       app/org.apache.lucene.core@9.12.2/org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnumFrame.loadBlock(SegmentTermsEnumFrame.java:185)
       app/org.apache.lucene.core@9.12.2/org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum.seekExact(SegmentTermsEnum.java:507)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.common.lucene.uid.PerThreadIDVersionAndSeqNoLookup.getDocID(PerThreadIDVersionAndSeqNoLookup.java:146)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.common.lucene.uid.PerThreadIDVersionAndSeqNoLookup.lookupVersion(PerThreadIDVersionAndSeqNoLookup.java:121)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.common.lucene.uid.VersionsAndSeqNoResolver.timeSeriesLoadDocIdAndVersion(VersionsAndSeqNoResolver.java:140)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.engine.InternalEngine.lambda$resolveDocVersion$7(InternalEngine.java:1035)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.engine.InternalEngine$$Lambda/0x000000007c54a740.apply(Unknown Source)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.engine.InternalEngine.performActionWithDirectoryReader(InternalEngine.java:3548)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.engine.InternalEngine.resolveDocVersion(InternalEngine.java:1028)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.engine.InternalEngine.planIndexingAsPrimary(InternalEngine.java:1343)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.engine.InternalEngine.indexingStrategyForOperation(InternalEngine.java:1320)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:1182)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:1105)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:1017)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:935)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:382)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:239)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:307)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:155)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:82)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:220)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:34)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
       java.base@25.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090)
       java.base@25.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
       java.base@25.0.1/java.lang.Thread.runWith(Thread.java:1487)
       java.base@25.0.1/java.lang.Thread.run(Thread.java:1474)
     5/10 snapshots sharing following 11 elements
       java.base@25.0.1/jdk.internal.misc.Unsafe.park(Native Method)
       java.base@25.0.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:369)
       java.base@25.0.1/java.util.concurrent.LinkedTransferQueue$DualNode.await(LinkedTransferQueue.java:458)
       java.base@25.0.1/java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:613)
       java.base@25.0.1/java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1257)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:153)
       java.base@25.0.1/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1016)
       java.base@25.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1076)
       java.base@25.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
       java.base@25.0.1/java.lang.Thread.runWith(Thread.java:1487)
       java.base@25.0.1/java.lang.Thread.run(Thread.java:1474)
   
   100.0% [cpu=34.6%, other=65.4%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[monitoring-apps-es-data-hot-1][write][T#13]'
     2/10 snapshots sharing following 30 elements
       app/org.apache.lucene.core@9.12.2/org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnumFrame.loadBlock(SegmentTermsEnumFrame.java:185)
       app/org.apache.lucene.core@9.12.2/org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum.seekExact(SegmentTermsEnum.java:507)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.common.lucene.uid.PerThreadIDVersionAndSeqNoLookup.getDocID(PerThreadIDVersionAndSeqNoLookup.java:146)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.common.lucene.uid.PerThreadIDVersionAndSeqNoLookup.lookupVersion(PerThreadIDVersionAndSeqNoLookup.java:121)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.common.lucene.uid.VersionsAndSeqNoResolver.timeSeriesLoadDocIdAndVersion(VersionsAndSeqNoResolver.java:140)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.engine.InternalEngine.lambda$resolveDocVersion$7(InternalEngine.java:1035)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.engine.InternalEngine$$Lambda/0x000000007c54a740.apply(Unknown Source)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.engine.InternalEngine.performActionWithDirectoryReader(InternalEngine.java:3548)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.engine.InternalEngine.resolveDocVersion(InternalEngine.java:1028)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.engine.InternalEngine.planIndexingAsPrimary(InternalEngine.java:1343)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.engine.InternalEngine.indexingStrategyForOperation(InternalEngine.java:1320)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:1182)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:1105)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:1017)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:935)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:382)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:239)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:307)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:155)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:82)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:220)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:34)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
       java.base@25.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090)
       java.base@25.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
       java.base@25.0.1/java.lang.Thread.runWith(Thread.java:1487)
       java.base@25.0.1/java.lang.Thread.run(Thread.java:1474)
     8/10 snapshots sharing following 11 elements
       java.base@25.0.1/jdk.internal.misc.Unsafe.park(Native Method)
       java.base@25.0.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:369)
       java.base@25.0.1/java.util.concurrent.LinkedTransferQueue$DualNode.await(LinkedTransferQueue.java:458)
       java.base@25.0.1/java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:613)
       java.base@25.0.1/java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1257)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:153)
       java.base@25.0.1/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1016)
       java.base@25.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1076)
       java.base@25.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
       java.base@25.0.1/java.lang.Thread.runWith(Thread.java:1487)
       java.base@25.0.1/java.lang.Thread.run(Thread.java:1474)
   
   100.0% [cpu=0.0%, other=100.0%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[readiness-service]'
     10/10 snapshots sharing following 7 elements
       java.base@25.0.1/sun.nio.ch.Net.accept(Native Method)
       java.base@25.0.1/sun.nio.ch.ServerSocketChannelImpl.implAccept(ServerSocketChannelImpl.java:424)
       java.base@25.0.1/sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:391)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.readiness.ReadinessService.lambda$startListener$0(ReadinessService.java:178)
       app/org.elasticsearch.server@8.19.7/org.elasticsearch.readiness.ReadinessService$$Lambda/0x000000007c3dd218.run(Unknown Source)
       java.base@25.0.1/java.lang.Thread.runWith(Thread.java:1487)
       java.base@25.0.1/java.lang.Thread.run(Thread.java:1474)

::: {monitoring-apps-es-master-0}{p0LSSj7QRiK0MKAk8Ai7EA}{PzVtbdciTuiJMAB_YcTDOQ}{monitoring-apps-es-master-0}{10.42.0.87}{10.42.0.87:9300}{mr}{8.19.7}{7000099-8536000}{ml.config_version=12.0.0, xpack.installed=true, transform.config_version=10.0.0, k8s_node_name=dsk8cpdmzd1.dmzqa.local}
   Hot threads at 2026-03-27T16:16:51.199Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {monitoring-apps-es-client-1}{uB8YS-vuTJ6tb8ONCi5prQ}{tkIepyB6RRGp-BcjuTp12g}{monitoring-apps-es-client-1}{10.42.0.85}{10.42.0.85:9300}{ir}{8.19.7}{7000099-8536000}{ml.config_version=12.0.0, xpack.installed=true, transform.config_version=10.0.0, k8s_node_name=dsk8cpdmzd1.dmzqa.local}
   Hot threads at 2026-03-27T16:16:51.197Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

cjpowers · March 27, 2026, 4:21pm

GET /_tasks:

{
  "nodes": {
    "rrHuXzVfTo6Z0KfOmZS2tA": {
      "name": "monitoring-apps-es-client-0",
      "transport_address": "10.42.1.240:9300",
      "host": "10.42.1.240",
      "ip": "10.42.1.240:9300",
      "roles": [
        "ingest",
        "remote_cluster_client"
      ],
      "attributes": {
        "transform.config_version": "10.0.0",
        "ml.config_version": "12.0.0",
        "k8s_node_name": "dsk8cpdmzd2.dmzqa.local",
        "xpack.installed": "true"
      },
      "tasks": {
        "rrHuXzVfTo6Z0KfOmZS2tA:30415992": {
          "node": "rrHuXzVfTo6Z0KfOmZS2tA",
          "id": 30415992,
          "type": "transport",
          "action": "indices:data/write/bulk",
          "start_time_in_millis": 1774628429311,
          "running_time_in_nanos": 10427774801,
          "cancellable": false,
          "headers": {}
        },
        "rrHuXzVfTo6Z0KfOmZS2tA:30415993": {
          "node": "rrHuXzVfTo6Z0KfOmZS2tA",
          "id": 30415993,
          "type": "transport",
          "action": "indices:data/write/bulk[s]",
          "start_time_in_millis": 1774628429315,
          "running_time_in_nanos": 10423941843,
          "cancellable": false,
          "parent_task_id": "rrHuXzVfTo6Z0KfOmZS2tA:30415992",
          "headers": {}
        },
        "rrHuXzVfTo6Z0KfOmZS2tA:30415994": {
          "node": "rrHuXzVfTo6Z0KfOmZS2tA",
          "id": 30415994,
          "type": "transport",
          "action": "indices:data/write/bulk",
          "start_time_in_millis": 1774628429317,
          "running_time_in_nanos": 10422660863,
          "cancellable": false,
          "headers": {}
        },
        "rrHuXzVfTo6Z0KfOmZS2tA:30416347": {
          "node": "rrHuXzVfTo6Z0KfOmZS2tA",
          "id": 30416347,
          "type": "transport",
          "action": "indices:data/write/bulk",
          "start_time_in_millis": 1774628439338,
          "running_time_in_nanos": 401491747,
          "cancellable": false,
          "headers": {}
        },
        "rrHuXzVfTo6Z0KfOmZS2tA:30416348": {
          "node": "rrHuXzVfTo6Z0KfOmZS2tA",
          "id": 30416348,
          "type": "transport",
          "action": "indices:data/write/bulk[s]",
          "start_time_in_millis": 1774628439339,
          "running_time_in_nanos": 400740560,
          "cancellable": false,
          "parent_task_id": "rrHuXzVfTo6Z0KfOmZS2tA:30416347",
          "headers": {}
        },
        "rrHuXzVfTo6Z0KfOmZS2tA:30415996": {
          "node": "rrHuXzVfTo6Z0KfOmZS2tA",
          "id": 30415996,
          "type": "transport",
          "action": "indices:data/write/bulk[s]",
          "start_time_in_millis": 1774628429319,
          "running_time_in_nanos": 10420088903,
          "cancellable": false,
          "parent_task_id": "rrHuXzVfTo6Z0KfOmZS2tA:30415992",
          "headers": {}
        },
        "rrHuXzVfTo6Z0KfOmZS2tA:30415997": {
          "node": "rrHuXzVfTo6Z0KfOmZS2tA",
          "id": 30415997,
          "type": "transport",
          "action": "indices:data/write/bulk[s]",
          "start_time_in_millis": 1774628429320,
          "running_time_in_nanos": 10419826873,
          "cancellable": false,
          "parent_task_id": "rrHuXzVfTo6Z0KfOmZS2tA:30415994",
          "headers": {}
        },
        "rrHuXzVfTo6Z0KfOmZS2tA:30416350": {
          "node": "rrHuXzVfTo6Z0KfOmZS2tA",
          "id": 30416350,
          "type": "transport",
          "action": "indices:data/write/bulk[s]",
          "start_time_in_millis": 1774628439340,
          "running_time_in_nanos": 399611593,
          "cancellable": false,
          "parent_task_id": "rrHuXzVfTo6Z0KfOmZS2tA:30416347",
          "headers": {}
        },
        "rrHuXzVfTo6Z0KfOmZS2tA:30415999": {
          "node": "rrHuXzVfTo6Z0KfOmZS2tA",
          "id": 30415999,
          "type": "transport",
          "action": "indices:data/write/bulk[s]",
          "start_time_in_millis": 1774628429324,
          "running_time_in_nanos": 10415771401,
          "cancellable": false,
          "parent_task_id": "rrHuXzVfTo6Z0KfOmZS2tA:30415994",
          "headers": {}
        },
        "rrHuXzVfTo6Z0KfOmZS2tA:30416351": {
          "node": "rrHuXzVfTo6Z0KfOmZS2tA",
          "id": 30416351,
          "type": "transport",
          "action": "cluster:monitor/tasks/lists[n]",
          "start_time_in_millis": 1774628439735,
          "running_time_in_nanos": 4055070,
          "cancellable": true,
          "cancelled": false,
          "parent_task_id": "kL7rMobiRX6lg_ovXXBS5g:1670399",
          "headers": {
            "trace.id": "6fceb842a68570bee892be04989df775"
          }
        }
      }
    },
    "5qO9yl6sR_qLtoM0IDnd4A": {
      "name": "monitoring-apps-es-data-hot-2",
      "transport_address": "10.42.1.26:9300",
      "host": "10.42.1.26",
      "ip": "10.42.1.26:9300",
      "roles": [
        "data",
        "remote_cluster_client"
      ],
      "attributes": {
        "ml.config_version": "12.0.0",
        "xpack.installed": "true",
        "box_type": "hot",
        "k8s_node_name": "dsk8cpdmzd2.dmzqa.local",
        "transform.config_version": "10.0.0"
      },
      "tasks": {
        "5qO9yl6sR_qLtoM0IDnd4A:15228882": {
          "node": "5qO9yl6sR_qLtoM0IDnd4A",
          "id": 15228882,
          "type": "transport",
          "action": "cluster:monitor/tasks/lists[n]",
          "start_time_in_millis": 1774628439734,
          "running_time_in_nanos": 2272622,
          "cancellable": true,
          "cancelled": false,
          "parent_task_id": "kL7rMobiRX6lg_ovXXBS5g:1670399",
          "headers": {
            "trace.id": "6fceb842a68570bee892be04989df775"
          }
        }
      }
    },
    "uB8YS-vuTJ6tb8ONCi5prQ": {
      "name": "monitoring-apps-es-client-1",
      "transport_address": "10.42.0.85:9300",
      "host": "10.42.0.85",
      "ip": "10.42.0.85:9300",
      "roles": [
        "ingest",
        "remote_cluster_client"
      ],
      "attributes": {
        "xpack.installed": "true",
        "ml.config_version": "12.0.0",
        "k8s_node_name": "dsk8cpdmzd1.dmzqa.local",
        "transform.config_version": "10.0.0"
      },
      "tasks": {
        "uB8YS-vuTJ6tb8ONCi5prQ:30692872": {
          "node": "uB8YS-vuTJ6tb8ONCi5prQ",
          "id": 30692872,
          "type": "transport",
          "action": "cluster:monitor/tasks/lists[n]",
          "start_time_in_millis": 1774628439735,
          "running_time_in_nanos": 13041197,
          "cancellable": true,
          "cancelled": false,
          "parent_task_id": "kL7rMobiRX6lg_ovXXBS5g:1670399",
          "headers": {
            "trace.id": "6fceb842a68570bee892be04989df775"
          }
        }
      }
    },
    "i4MWTxqtQaiSJlsCcuw1eA": {
      "name": "monitoring-apps-es-data-hot-0",
      "transport_address": "10.42.3.199:9300",
      "host": "10.42.3.199",
      "ip": "10.42.3.199:9300",
      "roles": [
        "data",
        "remote_cluster_client"
      ],
      "attributes": {
        "ml.config_version": "12.0.0",
        "xpack.installed": "true",
        "box_type": "hot",
        "k8s_node_name": "dsk8wkdmzd1.dmzqa.local",
        "transform.config_version": "10.0.0"
      },
      "tasks": {
        "i4MWTxqtQaiSJlsCcuw1eA:64": {
          "node": "i4MWTxqtQaiSJlsCcuw1eA",
          "id": 64,
          "type": "persistent",
          "action": "health-node[c]",
          "start_time_in_millis": 1773432799923,
          "running_time_in_nanos": 1195639810911369,
          "cancellable": true,
          "cancelled": false,
          "parent_task_id": "cluster:26",
          "headers": {}
        },
        "i4MWTxqtQaiSJlsCcuw1eA:65": {
          "node": "i4MWTxqtQaiSJlsCcuw1eA",
          "id": 65,
          "type": "persistent",
          "action": "geoip-downloader[c]",
          "start_time_in_millis": 1773432799924,
          "running_time_in_nanos": 1195639810189660,
          "cancellable": true,
          "cancelled": false,
          "parent_task_id": "cluster:27",
          "headers": {}
        },
        "i4MWTxqtQaiSJlsCcuw1eA:69861350": {
          "node": "i4MWTxqtQaiSJlsCcuw1eA",
          "id": 69861350,
          "type": "transport",
          "action": "cluster:monitor/tasks/lists[n]",
          "start_time_in_millis": 1774628439734,
          "running_time_in_nanos": 223988,
          "cancellable": true,
          "cancelled": false,
          "parent_task_id": "kL7rMobiRX6lg_ovXXBS5g:1670399",
          "headers": {
            "trace.id": "6fceb842a68570bee892be04989df775"
          }
        }
      }
    },
    "p0LSSj7QRiK0MKAk8Ai7EA": {
      "name": "monitoring-apps-es-master-0",
      "transport_address": "10.42.0.87:9300",
      "host": "10.42.0.87",
      "ip": "10.42.0.87:9300",
      "roles": [
        "master",
        "remote_cluster_client"
      ],
      "attributes": {
        "xpack.installed": "true",
        "ml.config_version": "12.0.0",
        "k8s_node_name": "dsk8cpdmzd1.dmzqa.local",
        "transform.config_version": "10.0.0"
      },
      "tasks": {
        "p0LSSj7QRiK0MKAk8Ai7EA:10243095": {
          "node": "p0LSSj7QRiK0MKAk8Ai7EA",
          "id": 10243095,
          "type": "transport",
          "action": "cluster:monitor/tasks/lists[n]",
          "start_time_in_millis": 1774628439738,
          "running_time_in_nanos": 185702,
          "cancellable": true,
          "cancelled": false,
          "parent_task_id": "kL7rMobiRX6lg_ovXXBS5g:1670399",
          "headers": {
            "trace.id": "6fceb842a68570bee892be04989df775"
          }
        }
      }
    },
    "kL7rMobiRX6lg_ovXXBS5g": {
      "name": "monitoring-apps-es-master-2",
      "transport_address": "10.42.3.221:9300",
      "host": "10.42.3.221",
      "ip": "10.42.3.221:9300",
      "roles": [
        "master",
        "remote_cluster_client"
      ],
      "attributes": {
        "xpack.installed": "true",
        "ml.config_version": "12.0.0",
        "k8s_node_name": "dsk8wkdmzd1.dmzqa.local",
        "transform.config_version": "10.0.0"
      },
      "tasks": {
        "kL7rMobiRX6lg_ovXXBS5g:1670399": {
          "node": "kL7rMobiRX6lg_ovXXBS5g",
          "id": 1670399,
          "type": "transport",
          "action": "cluster:monitor/tasks/lists",
          "start_time_in_millis": 1774628439733,
          "running_time_in_nanos": 1860969,
          "cancellable": true,
          "cancelled": false,
          "headers": {
            "trace.id": "6fceb842a68570bee892be04989df775"
          }
        },
        "kL7rMobiRX6lg_ovXXBS5g:1670400": {
          "node": "kL7rMobiRX6lg_ovXXBS5g",
          "id": 1670400,
          "type": "transport",
          "action": "cluster:monitor/tasks/lists[n]",
          "start_time_in_millis": 1774628439734,
          "running_time_in_nanos": 1114584,
          "cancellable": true,
          "cancelled": false,
          "parent_task_id": "kL7rMobiRX6lg_ovXXBS5g:1670399",
          "headers": {
            "trace.id": "6fceb842a68570bee892be04989df775"
          }
        }
      }
    },
    "84184n_yQPibCTZBesmROA": {
      "name": "monitoring-apps-es-data-hot-1",
      "transport_address": "10.42.2.9:9300",
      "host": "10.42.2.9",
      "ip": "10.42.2.9:9300",
      "roles": [
        "data",
        "remote_cluster_client"
      ],
      "attributes": {
        "ml.config_version": "12.0.0",
        "xpack.installed": "true",
        "box_type": "hot",
        "k8s_node_name": "dsk8cpdmzd3.dmzqa.local",
        "transform.config_version": "10.0.0"
      },
      "tasks": {
        "84184n_yQPibCTZBesmROA:28645361": {
          "node": "84184n_yQPibCTZBesmROA",
          "id": 28645361,
          "type": "transport",
          "action": "indices:data/write/bulk[s]",
          "start_time_in_millis": 1774628439342,
          "running_time_in_nanos": 396213670,
          "cancellable": false,
          "parent_task_id": "rrHuXzVfTo6Z0KfOmZS2tA:30416350",
          "headers": {}
        },
        "84184n_yQPibCTZBesmROA:28644817": {
          "node": "84184n_yQPibCTZBesmROA",
          "id": 28644817,
          "type": "transport",
          "action": "indices:data/write/bulk[s]",
          "start_time_in_millis": 1774628429324,
          "running_time_in_nanos": 10414963514,
          "cancellable": false,
          "parent_task_id": "rrHuXzVfTo6Z0KfOmZS2tA:30415997",
          "headers": {}
        },
        "84184n_yQPibCTZBesmROA:28645360": {
          "node": "84184n_yQPibCTZBesmROA",
          "id": 28645360,
          "type": "transport",
          "action": "indices:data/write/bulk[s][p]",
          "start_time_in_millis": 1774628439341,
          "running_time_in_nanos": 398133696,
          "cancellable": false,
          "parent_task_id": "84184n_yQPibCTZBesmROA:28645359",
          "headers": {}
        },
        "84184n_yQPibCTZBesmROA:28644816": {
          "node": "84184n_yQPibCTZBesmROA",
          "id": 28644816,
          "type": "transport",
          "action": "indices:data/write/bulk[s]",
          "start_time_in_millis": 1774628429323,
          "running_time_in_nanos": 10415233764,
          "cancellable": false,
          "parent_task_id": "rrHuXzVfTo6Z0KfOmZS2tA:30415996",
          "headers": {}
        },
        "84184n_yQPibCTZBesmROA:28644819": {
          "node": "84184n_yQPibCTZBesmROA",
          "id": 28644819,
          "type": "transport",
          "action": "indices:data/write/bulk[s][p]",
          "start_time_in_millis": 1774628429324,
          "running_time_in_nanos": 10414641630,
          "cancellable": false,
          "parent_task_id": "84184n_yQPibCTZBesmROA:28644817",
          "headers": {}
        },
        "84184n_yQPibCTZBesmROA:28645362": {
          "node": "84184n_yQPibCTZBesmROA",
          "id": 28645362,
          "type": "transport",
          "action": "indices:data/write/bulk[s][p]",
          "start_time_in_millis": 1774628439343,
          "running_time_in_nanos": 396119144,
          "cancellable": false,
          "parent_task_id": "84184n_yQPibCTZBesmROA:28645361",
          "headers": {}
        },
        "84184n_yQPibCTZBesmROA:28644818": {
          "node": "84184n_yQPibCTZBesmROA",
          "id": 28644818,
          "type": "transport",
          "action": "indices:data/write/bulk[s][p]",
          "start_time_in_millis": 1774628429324,
          "running_time_in_nanos": 10414961869,
          "cancellable": false,
          "parent_task_id": "84184n_yQPibCTZBesmROA:28644816",
          "headers": {}
        },
        "84184n_yQPibCTZBesmROA:28644828": {
          "node": "84184n_yQPibCTZBesmROA",
          "id": 28644828,
          "type": "transport",
          "action": "indices:data/write/bulk[s][p]",
          "start_time_in_millis": 1774628429328,
          "running_time_in_nanos": 10410963570,
          "cancellable": false,
          "parent_task_id": "84184n_yQPibCTZBesmROA:28644826",
          "headers": {}
        },
        "84184n_yQPibCTZBesmROA:28644826": {
          "node": "84184n_yQPibCTZBesmROA",
          "id": 28644826,
          "type": "transport",
          "action": "indices:data/write/bulk[s]",
          "start_time_in_millis": 1774628429327,
          "running_time_in_nanos": 10411230567,
          "cancellable": false,
          "parent_task_id": "rrHuXzVfTo6Z0KfOmZS2tA:30415999",
          "headers": {}
        },
        "84184n_yQPibCTZBesmROA:28645370": {
          "node": "84184n_yQPibCTZBesmROA",
          "id": 28645370,
          "type": "transport",
          "action": "cluster:monitor/tasks/lists[n]",
          "start_time_in_millis": 1774628439735,
          "running_time_in_nanos": 4148665,
          "cancellable": true,
          "cancelled": false,
          "parent_task_id": "kL7rMobiRX6lg_ovXXBS5g:1670399",
          "headers": {
            "trace.id": "6fceb842a68570bee892be04989df775"
          }
        },
        "84184n_yQPibCTZBesmROA:28645359": {
          "node": "84184n_yQPibCTZBesmROA",
          "id": 28645359,
          "type": "transport",
          "action": "indices:data/write/bulk[s]",
          "start_time_in_millis": 1774628439340,
          "running_time_in_nanos": 398234112,
          "cancellable": false,
          "parent_task_id": "rrHuXzVfTo6Z0KfOmZS2tA:30416348",
          "headers": {}
        },
        "84184n_yQPibCTZBesmROA:28644815": {
          "node": "84184n_yQPibCTZBesmROA",
          "id": 28644815,
          "type": "transport",
          "action": "indices:data/write/bulk[s][p]",
          "start_time_in_millis": 1774628429320,
          "running_time_in_nanos": 10419034685,
          "cancellable": false,
          "parent_task_id": "84184n_yQPibCTZBesmROA:28644814",
          "headers": {}
        },
        "84184n_yQPibCTZBesmROA:28644814": {
          "node": "84184n_yQPibCTZBesmROA",
          "id": 28644814,
          "type": "transport",
          "action": "indices:data/write/bulk[s]",
          "start_time_in_millis": 1774628429319,
          "running_time_in_nanos": 10419334896,
          "cancellable": false,
          "parent_task_id": "rrHuXzVfTo6Z0KfOmZS2tA:30415993",
          "headers": {}
        }
      }
    },
    "bYH966e8SVKxMCWzfsxJig": {
      "name": "monitoring-apps-es-master-1",
      "transport_address": "10.42.2.224:9300",
      "host": "10.42.2.224",
      "ip": "10.42.2.224:9300",
      "roles": [
        "master",
        "remote_cluster_client"
      ],
      "attributes": {
        "transform.config_version": "10.0.0",
        "ml.config_version": "12.0.0",
        "k8s_node_name": "dsk8cpdmzd3.dmzqa.local",
        "xpack.installed": "true"
      },
      "tasks": {
        "bYH966e8SVKxMCWzfsxJig:3478269": {
          "node": "bYH966e8SVKxMCWzfsxJig",
          "id": 3478269,
          "type": "transport",
          "action": "cluster:monitor/tasks/lists[n]",
          "start_time_in_millis": 1774628439734,
          "running_time_in_nanos": 3698352,
          "cancellable": true,
          "cancelled": false,
          "parent_task_id": "kL7rMobiRX6lg_ovXXBS5g:1670399",
          "headers": {
            "trace.id": "6fceb842a68570bee892be04989df775"
          }
        }
      }
    }
  }
}

cjpowers · March 27, 2026, 4:49pm

GET _ilm/policy/logs-policy:

{
  "logs-policy": {
    "version": 3,
    "modified_date": "2026-03-26T17:57:38.538Z",
    "policy": {
      "phases": {
        "delete": {
          "min_age": "1d",
          "actions": {
            "delete": {
              "delete_searchable_snapshot": true
            }
          }
        },
        "hot": {
          "min_age": "0ms",
          "actions": {
            "rollover": {
              "max_age": "1d",
              "max_primary_shard_size": "20gb"
            }
          }
        }
      }
    },
    "in_use_by": {
      "indices": [
        "shrink-4hvu-.ds-logs-2026.03.26-000040",
        ".ds-logs-2026.03.27-000043",
        ".ds-logs-2026.03.26-000040"
      ],
      "data_streams": [
        "logs"
      ],
      "composable_templates": [
        "logs-idx-template"
      ]
    }
  }
}

eck-apps:
  enabled: true
  elasticsearch:
    externalLoggingEnabled: false
    master:
      replicas: 3
      storage: 6Gi
      memory: 6Gi
    dataHot:
      replicas: 3
      storage: 50Gi
      memory: 8Gi
      cpu: 1500m
    client:
      replicas: 2
      storage: 6Gi
      memory: 2Gi
    clientExternal:
      replicas: 2
      memory: 4Gi
  logging:
    number_of_shards: 3
    number_of_replicas: 1
    rollover: # https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-rollover.html#ilm-rollover-options
      max_age: "1d" # Triggers rollover after the maximum elapsed time from index creation is reached
      max_primary_shard_size: "20GB" # Triggers rollover when the index reaches a certain size
    shrink:
      number_of_shards: 0 #This didn't work, defaulted back to 1, had to change in Kibana
    delete: # https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-delete.html#ilm-delete-options
      min_age: "1d"

cjpowers · March 27, 2026, 4:51pm

Hot Threads:
https://pastes.io/a54tT4Q9

RainTown · March 27, 2026, 4:53pm

something. not adding up here

  logging:
    number_of_shards: 3
    number_of_replicas: 1
    rollover: # https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-rollover.html#ilm-rollover-options
      max_age: "1d" # Triggers rollover after the maximum elapsed time from index creation is reached
      max_primary_shard_size: "20GB" # Triggers rollover when the index reaches a certain size

I dont know much about ECK, but I think that config means rollover when

max_age: "1d" OR max_primary_shard_size: "20GB"

??

You have 3 primary shards. Thats 60GB total index size. Do you really index less than that per day?

I think it'd be useful to see the actual ILM policy here:

GET .ds-logs-2026.03.27-000043/_ilm/explain

I come also back to one of my earlier questions, what are you trying to do, in normal language, with your logs? What the desired lifecycle>? Because

dataHot:
  replicas: 3
  storage: 50Gi
  memory: 8Gi
  cpu: 1500m

seems quite tight. In passing, you are not giving those pods much oooomph in terms of disk or memory or CPU. with 50GB disk, and even 2x 20G shards on a node, you are already close to 80%.

EDIT: when it's in its "going a bit mad" phase, check the _cat/shards output for shards relocating. and for logs for warnings about disk space.

cjpowers · March 27, 2026, 5:51pm

I believe that is what those overrides are doing, it will rollover at 20gb (usually 1.5X that setting) or the rollover at night. We are also trying to figure out how much space we need to give the nodes.

The defaults are:

eck-apps:
  elasticsearch:
    data_hot:
      replicas: 2
      storage: 60Gi
      memory: 4Gi
    data_cold:
      replicas: 0
      storage: 60Gi
      memory: 4Gi
  logging:
    number_of_shards: 2
    number_of_replicas: 1
    rollover:
      max_age: "1d"
      max_primary_shard_size: "1GB"
    shrink:
      number_of_shards: 2
    delete:
      min_age: "1d"

And their manual states we would need 125GB to have some overhead.

We originally had 3 days, but we were trying to shorten that time in order to not fill up the disk. The overall goal is to get Elasticsearch working with at least a a few days of data for troubleshooting, and stable enough to get AlertManager, without killing our drives with that much data reads.

GET .ds-logs-2026.03.27-000043/_ilm/explain

{
  "indices": {
    ".ds-logs-2026.03.27-000043": {
      "index": ".ds-logs-2026.03.27-000043",
      "managed": true,
      "policy": "logs-policy",
      "index_creation_date_millis": 1774582573149,
      "time_since_index_creation": "13.31h",
      "lifecycle_date_millis": 1774582573149,
      "age": "13.31h",
      "phase": "hot",
      "phase_time_millis": 1774582573521,
      "action": "rollover",
      "action_time_millis": 1774582573922,
      "step": "check-rollover-ready",
      "step_time_millis": 1774582573922,
      "phase_execution": {
        "policy": "logs-policy",
        "phase_definition": {
          "min_age": "0ms",
          "actions": {
            "rollover": {
              "max_age": "1d",
              "min_docs": 1,
              "max_primary_shard_docs": 200000000,
              "max_primary_shard_size": "20gb"
            }
          }
        },
        "version": 3,
        "modified_date_in_millis": 1774547858538
      },
      "skip": false
    }
  }
}

RainTown · April 14, 2026, 1:40pm

Just to add some context that was shared me on a PM chat, in case others have ideas. The words below are mine, @cjpowers can correct if I've misunderstood something.

Elasticsearch is bundled in an overall product from a 3rd party vendor, with what he referred to as overrides,. The vendor-provided package has been installed as-is.

On vendor suggestion, they isolated the hot workload to a single worker node, via reducing statefulset down to one. That didn’t change the strange IO behavior. Also running on control plane vs. worker didn’t change the behavior. Moving from Netapp to local storage gave more throughput, sort of making the issue "worse", i.e. ever more IO.

The storage: 50Gi setting is seemingly ignored when that storage is local storage, or at least not respected. Go figure! The underlying filesystem has more than the 50GB space, 200GB, which was consistent with the _cat/nodes output that was previously shared.

Antivirus protections and snapshots are disabled.

Increasing RAM seemed to help for a bit, then the IO issue returned, see:

They re still investigating, dont know what the trigger is, but the significant IO IS from the ES cluster for that namespace, and just on the hot pods. If they block writes via below, the IO issue goes away, AFAIK effectively immediately.

PUT /.ds-logs*/_settings
{"index.blocks.write": true}

Things are harder In part because this 3rd party vendor's "bundled solution" is very much black box from my distance.

Topic		Replies	Views
High CPU Usage on a few data nodes / Hotspotting of data Elasticsearch	191	748	January 14, 2026
Elasticsearch 7.17.10 indexing bottleneck on i3.2xlarge and d3.2xlarge nodes in EKS Elasticsearch	53	1735	June 22, 2023
High CPU load Elasticsearch	10	959	May 10, 2022
What can I do to make the "readings" do not disturb "writings"? Elasticsearch	7	447	July 6, 2017
Abnormally high CPU usage for specific queries/dashboards Elasticsearch	2	179	May 15, 2024

High sustained read traffic from Elasticsearch hot pods

Related topics