Balance disk usage across warm nodes

How I resolved it

I managed to balance disk usage across my nodes without overloading the JVM or hitting circuit breakers. Here’s what I did step by step:

  1. Throttle relocations first (avoid heap spikes/circuit breakers during moves)
PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.cluster_concurrent_rebalance": "1",
    "cluster.routing.allocation.node_concurrent_incoming_recoveries": "1",
    "cluster.routing.allocation.node_concurrent_outgoing_recoveries": "1",
    "indices.recovery.max_bytes_per_sec": "40mb"
  }
}
  1. Use absolute disk watermarks (react before disks are critically full)
    Adjust GB to your disk sizes.
PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low":  "25gb",
    "cluster.routing.allocation.disk.watermark.high": "20gb",
    "cluster.routing.allocation.disk.watermark.flood_stage": "10gb"
  }
}
  1. Balance by disk and shard count (keep free space and shard counts close)
PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.balance.disk_usage": "0.60",
    "cluster.routing.allocation.balance.shard":      "0.35",
    "cluster.routing.allocation.balance.index":      "0.05"
  }
}

Result: disk usage is now more even across nodes, shard counts per node are close, and JVM stays stable (no circuit breaker trips during rebalancing).