Graceful shard management?

Most challenging problems occur "when things have gone wrong". We have Warm Nodes that are oversharded and have network access storage mounted by mistake. The result is we have a good amount of data already on NAS, but the impact to performance now destabilizes the whole 35 node cluster daily.

We want to devise a way to move shards off the affected Warm Nodes in a gradual way while at the same time preventing any new shards from being moved to the affected Node. We don't want to dump all shards at once the way the allocation.exclude filter does--just prevent any new shards appearing on the target Node. We have dozens of policies and templates which are not organized / standardized so we're hoping for a cluster wide solution.

Is there a way to simply say "don't put any new shards on this node"?

Thanks!

Hi Curtis!

No, sorry, roughly speaking at the cluster level either you want shards on a node, or you don't. You could apply an allocation filter only to the new indices perhaps?

Why do you need to do anything more gradually than Elasticsearch would do with an allocation filter that excludes the node? There's already controls in place to make sure that a node is evacuated slowly enough that it doesn't destabilise the cluster. Do you just not have the space to put them elsewhere right now?

6 of our 10 Warm Nodes have 1 SSD and 1 NFS mount while the remaining 4 only have SSD. When we bring 1 of the NFS/SSD nodes down, the subsequent relocation of shards which includes the 5 other NFS/SSD nodes causes cascade failure of the remaining 5. The NFS can't keep up with the juggling.

That usually means your recovery settings are too aggressive. Often folks have cluster.routing.allocation.node_concurrent_recoveries and friends set far too high (2 is the default and that's a good number) and indices.recovery.max_bytes_per_sec is another popular setting to turn up to unreasonable levels. How are these configured for you?

We are set for the default 2 concurrent recoveries and

'indices': {'recovery': {'max_bytes_per_sec': '375mb'}},

375MBps (equivalent to 3Gbps) seems pretty punchy, I'd recommend toning that down if your cluster struggles when moving shards around. I imagine you can leave it higher on the SSD-only nodes, assuming your network can cope at least.