Controlled rotation of elasticsearch data nodes while enabling the shard allocation awareness

Hi All,

We are trying to enable the shard allocation awareness on the elasticsearch cluster on "zone" attribute while rotating the data nodes one after the other. We wanted to achieve this in more controlled manner. Initial cluster state is that none of the data nodes have any value set for the attribute "zone". To do that following are the steps followed,

  1. Enable the shard allocation awareness on the master nodes by setting
cluster.routing.allocation.awareness.attributes: zone
  1. Disable shard allocation on the cluster using the following
curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": "none"
  }
}
'
  1. Boot new data nodes with the "zone" attribute.
  2. Exclude one of the old data node to move the shards from that node to new data nodes with "zone" attribute.
curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "cluster.routing.allocation.exclude._ip": "x.y.y.y"
  }
}
'

But this is not working. None of the shards from the excluded old data node are moving to the new data nodes. The only option we can see is to move shard one by one from the old data node using the "_cluster/reroute" API. But this feels wrong. Also if we enable the shard allocation by setting "cluster.routing.allocation.enable" to "all", all the shards from the old nodes will start moving to the new data nodes at once which might cause cluster to become unstable.

Is there any way to achieve controlled rollout?

PS: We cannot restart nodes by adding attribute. The reason is we wanted to do few more changes to the nodes at the infra level.

Thanks in advance

This disables shard allocations, so it is expected that the shards will not move to the new nodes.

No shard allocations of any kind are allowed for any indices.

You need to have it set to all or at least primaries so the shards of current indices can be allocated in the new nodes.

I do not entirely get what you are trying to do.

Are you just swapping nodes? Removing some old nodes and adding new nodes?

Yes @leandrojmp . We are trying to swap the nodes along with enabling shard allocation awareness.

By default Elasticsearch will only move up to two shards at once, to avoid any such instability.

@DavidTurner , Are you talking about which of these two properties?

cluster.routing.allocation.node_concurrent_recoveries
cluster.routing.allocation.cluster_concurrent_rebalance

Also if we want to stop the recoveries/rebalance during the peak traffic hours, can we make it to zero? Will that solve my problem of controlled rollout?

Those two settings are related, yes, but I strongly recommend you leave them at their default values always.

If your cluster is properly configured then there should be no need to avoid recoveries during peak traffic hours. "Properly configured" here includes leaving the settings you mentioned above at their default values.

Thanks @DavidTurner for the reply. Have one more doubt to control the speed of the recovery. Which of the following two properties would help to control the speed of recovery,

indices.store.throttle.max_bytes_per_sec (since we are on 5.6)

indices.recovery.max_bytes_per_sec

Oh, sorry, the responses I gave above assumed you were on a supported & maintained version (7.17 or later). 5.6 passed EOL over 5 years ago and I don't remember how things worked back then. You need to upgrade as a matter of urgency.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.