ILM policy for indices from APM server

Hi,

I’m running Elasticsearch 9.0.3 managed by ECK on AKS, and I’m seeing persistent high JVM heap usage on my warm nodes.

Cluster topology

  • 3 × master nodes

  • 2 × hot data nodes

  • 2 × warm data nodes

  • Persistent volumes per data node

  • JVM heap on warm nodes: 2.5 GB

Workload

  • APM data streams:

    • traces-apm-*

    • logs-apm.*

    • metrics-apm.*

  • Traces volume: 20+ GB per day

  • Logs/metrics: typically tens of MB per day

  • Rollover currently happens daily (or at ~50 GB)

ILM (current)

  • hot → warm after ~8 days

  • delete after 180 days

  • replicas = 0 (temporarily set on warm to reduce pressure)

Problem

  • One warm node currently holds ~1 TB of data and ~950 shards

  • JVM heap usage on that node stays around 92–93%

  • Heap pressure appears to be driven by shard/segment overhead rather than fielddata

Question
I’m considering splitting ILM into two policies:

  1. Traces policy

    • rollover: 50GB OR 1d

    • hot → warm after ~8 days

    • delete after 180 days

  2. Logs/metrics policy

    • rollover: 5GB OR 10–30d

    • hot → warm after ~8 days

    • delete after 180 days

Is this a recommended approach for APM-heavy clusters?

Thanks in advance for any guidance or real-world experience.