Balance New Data Among Nodes

I am doing high volume inserts into a cluster. Sometimes we take a node down for maintenance, or failure. When the node comes back up, the disk utilization on the node is lower than the other nodes. This causes new shards to be created on this node at a much higher rate, and the volume of new data hitting this node is very high after rejoining the cluster.

Is there a setting to ensure new shard creation is evenly balanced across the cluster, and let the re-balance (which we throttle, and run heavier at night) worry about disk utilization differences. Since we delete data after X days, rebalance will fix itself over time as well. It would be best for us to ensure good balance at insert time.

Thanks

I am trying the following with good results.

"cluster.routing.allocation.balance.index": "100f"

And getting an even distribution of shards across nodes. Any dangers to lookout for ?

Is there a way to specify this for a few high-volume templates, and allow defaults to rule the rest?

You can delay shard allocation when you take down a node for maintenance by setting the index.unassigned.node_left.delayed_timeout to a high enough value, for instance "60m" for an hours downtime. When the node comes up again it will recover the same shards it had before it was taken down, meaning your cluster will be balanced just like it was before the node was taken down. Just make sure to reset the delayed_timeout once you're done with the maintenance.

Of course, setting delayed_timeout means that while the node is down some of your indices will be in a yellow state because no replica from that node will be allocated to other nodes, so there is no fault tolerance anymore. But for a quick maintenance downtime this might be acceptable. I do this during rolling upgrades to stop shards from allocating to other nodes while I do the job.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.