Incorrect shard allocation

i have 15nodes cluster with various mix of datasources 8% is larger shards 10-30GB size. 30% is smaller shards 1-15GB. And majority is very small shards megabites 60%.
The reason of having small shards is having lot of small different indices.

When ILM removes small shards from the node, the cluster allocates shards onto this node despite the fact this node is full 90%disk allocation.

The problem si amplified by Elasticsearch when it creates all shards on the same node. (the one with 90% allocated space and lowest number of shards). It causes a complete cluster freeze.

The Cluster

  • v7.13.3.
    15x datanode(32GB RAM, 4core, 500GB SSD HOT)
    3x master node
    3x coordinator node
    5x hot node (32GB RAM, 4core, 2TB SAS)

When analysing shards allocation, why I have all shards on the same node it reports this:

      "node_decision" : "worse_balance",
      "weight_ranking" : 8

can you explain the meaning of weight_ranking and worse balance?

Can anyone please help me with this problem?
Am I missing some important understanding of how allocation is done.

Can I configure the cluster to avoid shards allocation on the same node?

The problem repeats on different nodes every day. I have to stop the data load disable shard allocation on problematic node. Manual rollover and enable shard allocation .
I am thinking of bad dirty solution like cron script to evaluate disk space + number of shards and fake shards on node problematic node. I belive there is a right solution..

thank u

It sounds like you may need to layer index-level shard allocation awareness into your setup (alternative reference). With ILM, Elastic recommends using data tiers for automated routing off node roles however it still works in tandum with node attributes. (Note: node attributes overrides node roles during allocation routing.)

1 Like

Thank you @Stef_Nestor for very prompt response
i recently workarounded the problem by adding this to index template

    "routing.allocation.total_shards_per_node":"1"

@stef you recommend to migrate from node attributes to node roles, what is the reason ?
Are node-attributes going to be replaced by tiers in new version?
thank u

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.