Balance disk usage across warm nodes

Elasticsearch: 9.0.3 (ECK managed)

  1. Kubernetes: AKS

  2. Topology: 2 hot data nodes, 2 warm data nodes, 3 master nodes

  3. Storage: persistent volumes per data node

  4. Workload: APM traces (plus APM logs/metrics); data streams with rollover (~5 GB / ~8 h)

  5. ILM: hot → warm (after ~10 days), then delete after 180 days

  6. Replicas: currently 0 on hot; set to 0 on warm temporarily to reduce pressure

    After setting replicas=0 to stabilize, one warm node’s disk keeps getting much fuller than the other:

    es-warm-0 disk.total - 393.1gb, disk.used - 343.2gb, disk.avail. - 49.9gb
    es-warm-1 disk.total - 393.1gb, disk.used - 298.9gb, disk.avail. - 94.1gb

    What’s the recommended way to make the allocator prioritize disk usage so warm nodes converge on similar free space?

How I resolved it

I managed to balance disk usage across my nodes without overloading the JVM or hitting circuit breakers. Here’s what I did step by step:

  1. Throttle relocations first (avoid heap spikes/circuit breakers during moves)
PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.cluster_concurrent_rebalance": "1",
    "cluster.routing.allocation.node_concurrent_incoming_recoveries": "1",
    "cluster.routing.allocation.node_concurrent_outgoing_recoveries": "1",
    "indices.recovery.max_bytes_per_sec": "40mb"
  }
}
  1. Use absolute disk watermarks (react before disks are critically full)
    Adjust GB to your disk sizes.
PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low":  "25gb",
    "cluster.routing.allocation.disk.watermark.high": "20gb",
    "cluster.routing.allocation.disk.watermark.flood_stage": "10gb"
  }
}
  1. Balance by disk and shard count (keep free space and shard counts close)
PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.balance.disk_usage": "0.60",
    "cluster.routing.allocation.balance.shard":      "0.35",
    "cluster.routing.allocation.balance.index":      "0.05"
  }
}

Result: disk usage is now more even across nodes, shard counts per node are close, and JVM stays stable (no circuit breaker trips during rebalancing).