after quite some time running a cluster with 444 indices I suddenly experience a problem while creating new indices.
All Shards get allocated on the same node when I create new indices (replication = 0).
It's an 8 node cluster and I use
index.routing.allocation.require.storage_type to only assign new indices to 4 of them.
/_nodes api shows that all 4 nodes have this attribute
storage_type set correctly (and in the past it worked fine).
I'm below the disk watermarks and I can not find a reason why Elasticsearch would allocate all shards onto the same node.
I'm on ES 2.2
Someone got an idea on how I can debug this wired shard allocation problem?
Is it possible that the node that receives the new shards has less shards than the other ones with same storage_type? Can you check how many shards each of the 4 nodes has?
yes you're right. The other 3 have 520 shards but this one (who gets all new ones) only got 344.
Actually I have no idea why this is not overall balanced.
Also I thought that shard allocation balancing works on per-index base.
Can you recommend any solution to this state?
Should I manually reroute a lot of shards?
shard balancing works on the basis of 2 factors. One is trying to spread shards of same index, the other is trying to assign the same number of shards to each node. In this case, the node balance is so skewed that the index spread factor does not play a role. One possibility is to adapt the two factors (see https://www.elastic.co/guide/en/elasticsearch/reference/2.3/shards-allocation.html#_shard_balancing_heuristics ). The better solution is to investigate why 3 of the nodes have more shards than the other one. Are there any active relocations going on in your cluster? (Maybe node 4 was down and the cluster is still trying to balance shards to the node).
Thanks for the explanation.
Ah, I just noticed that the node with less shards actually is close to the low disk watermark.
This is probably the reason.
Can I expect ES to re-balance automatically when I free up disk space on this node?
yes. Shards should start relocating to the node pretty quickly after that (disk space usage is polled every 30 seconds).