Weird index shard allocation behaviour

Hello everyone,

I wonder if you can help me - I have a very weird problem

I am running the ELK stack version 7.12.0
I have a 5 data node Elastic cluster - 8 TB of disk each 32GB RAM, 8 CPUs
Utilization of disk on the nodes is between 78% to 89% (on data-5)
I have a periodic logstash job that pulls data from two tables (call them front end and back end) and pushes them to 3 elastic indexes (daily partitioned) call them fe_data, be_partial_data and be_full_data - fe_data has 1 replica, back end data indexes have no replicas. All 3 indexes have 3 shards

The data volume is quite high - front end data about 20GB per day, be_partial_data under 1 GB per day and be_full_data over 300GB per day
As part of the logstash job to populate the back end indexes it searches the corresponding front end indexes to enrich the back end data

The issue is the following
Even though data-5 has the highest disk utilization (up to 89%) when a new day comes and the day's indexes are created all shards of all indexes (except for the fe_data replica shards) are created in the data-5 node.

This causes very high CU utilization in the data-5 node (up to 90%) while the other data nodes are more or less idle (less than 30% CPU utilization)

I have tried to mess around with the shard balancing heuristics (currently cluster...balance shard 0.35 - cluster...balance.index 0.8) but still the same behaviur persists

Any and all help will be highly appreciated

Thank you in advance

How many primary and replica shards does the problematic index have? have you tried using the total shards per node index setting to force distribution across multiple nodes? If you try using this make sure to use a setting that allows all shards to be allocated even if nodes fail.

Initially all problematic indexes had 3 primary shards and no replica shards
Currently I changed the allocation to the following:

  1. front end index - 1 shard + 1 replica
  2. back end partial data - 1 shard
  3. back end full index - 5 shards

I haven't changed the total shards per node index setting but will do so
I can change it to 2 so that all shards can be allocated - since I have 5 data nodes and not more than 5 shards per node
but why does ELK insist on allocation on data-5?

By default Elasticsearch won't allocate any shards to nodes that are >85% full, so I think you must have adjusted the disk watermarks to cause this.

Hmmm... As per the documentation cluster.routing.allocation.disk.watermark.low only impacts replica shards and not primary shards of newly created indexes - and this is set to 85% (default value) - I tried lowering it to 80% and 82% with no change
cluster.routing.allocation.disk.watermark.high impacts all shards and acts like a hard block.

What I would want is a setting that says - avoid allocating shards to a node over X% full unless you cant allocate them elsewhere. And also what would be useful is a setting that says avoid allocating shards of the indexes A,B and C on the same node (unless you can't do it anywhere else)

The point is that with 40TB f space even a 90% threshold (of the high watermark) still leaves 4TB available

However I took your advice and changed the total shards per node index setting for my large backend index and while the cluster is still relocating shards the CPU utilization of data-5 has gone down.
Thank you

Ohh yes sorry you're quite right, the low watermark doesn't apply here after all.