ILM policy for indices from APM server

Narek_Martirosyan · December 17, 2025, 7:04am

Hi,

I’m running Elasticsearch 9.0.3 managed by ECK on AKS, and I’m seeing persistent high JVM heap usage on my warm nodes.

Cluster topology

3 × master nodes
2 × hot data nodes
2 × warm data nodes
Persistent volumes per data node
JVM heap on warm nodes: 2.5 GB

Workload

APM data streams:
- traces-apm-*
- logs-apm.*
- metrics-apm.*
Traces volume: 20+ GB per day
Logs/metrics: typically tens of MB per day
Rollover currently happens daily (or at ~50 GB)

ILM (current)

hot → warm after ~8 days
delete after 180 days
replicas = 0 (temporarily set on warm to reduce pressure)

Problem

One warm node currently holds ~1 TB of data and ~950 shards
JVM heap usage on that node stays around 92–93%
Heap pressure appears to be driven by shard/segment overhead rather than fielddata

Question
I’m considering splitting ILM into two policies:

Traces policy
- rollover: 50GB OR 1d
- hot → warm after ~8 days
- delete after 180 days
Logs/metrics policy
- rollover: 5GB OR 10–30d
- hot → warm after ~8 days
- delete after 180 days

Is this a recommended approach for APM-heavy clusters?

Thanks in advance for any guidance or real-world experience.

Rafa_Silva · December 18, 2025, 11:14pm

Yes, splitting ILM policies by data type (traces vs logs/metrics) is a recommended and proven approach for APM-heavy clusters.

Your main issue is shard and segment overhead on warm nodes, not data volume itself. Traces generate far more shards and segments than logs/metrics, so isolating them with a dedicated ILM policy is the correct move.

Key additional recommendations:

Reduce shard count aggressively for traces (fewer, larger shards).

Consider a shorter hot phase for traces.

Apply forcemerge (max 1 segment) before or during warm transition.

Avoid long retention on warm for traces unless required.

If possible, increase heap on warm nodes or add one more warm node.

Overall: Yes, your approach is correct, but shard reduction and segment consolidation are the real fixes.

Narek_Martirosyan · December 19, 2025, 7:48am

@Rafa_Silva Thanks a lot for the response
This is very helpful and confirms what I was suspecting

Topic		Replies	Views
ILM strategy for filebeat, metricbeat and apm indices Elasticsearch ilm-index-lifecycle-management	5	448	July 21, 2021
Memory spikes on ILM transition Elasticsearch ilm-index-lifecycle-management	8	221	January 8, 2024
ILM with rollover for APM indices on Elastic Cloud Elasticsearch	4	838	August 27, 2019
Elastic Cloud - More than 100 APM indices - How to automate the deletion? APM server	2	873	January 22, 2020
Sudden rollover of apm indices Elasticsearch ilm-index-lifecycle-management	6	332	April 29, 2024

ILM policy for indices from APM server

Related topics