i have 15nodes cluster with various mix of datasources 8% is larger shards 10-30GB size. 30% is smaller shards 1-15GB. And majority is very small shards megabites 60%.
The reason of having small shards is having lot of small different indices.
When ILM removes small shards from the node, the cluster allocates shards onto this node despite the fact this node is full 90%disk allocation.
The problem si amplified by Elasticsearch when it creates all shards on the same node. (the one with 90% allocated space and lowest number of shards). It causes a complete cluster freeze.
15x datanode(32GB RAM, 4core, 500GB SSD HOT)
3x master node
3x coordinator node
5x hot node (32GB RAM, 4core, 2TB SAS)
When analysing shards allocation, why I have all shards on the same node it reports this:
"node_decision" : "worse_balance", "weight_ranking" : 8
can you explain the meaning of weight_ranking and worse balance?
Can anyone please help me with this problem?
Am I missing some important understanding of how allocation is done.
Can I configure the cluster to avoid shards allocation on the same node?
The problem repeats on different nodes every day. I have to stop the data load disable shard allocation on problematic node. Manual rollover and enable shard allocation .
I am thinking of bad dirty solution like cron script to evaluate disk space + number of shards and fake shards on node problematic node. I belive there is a right solution..