Setting cluster restart settings


I am trying to set some sane values on the gateway recovery settings and wanted to make sure that I am understanding these correctly.

Assuming that I have a cluster with 3 master-eligible nodes and 10 data nodes with a replication factor of 3 (2 replicas of each primary).

I want to make sure I understand these settings, so these are the assumptions I am making I after a full cluster restart when using these settings:

discovery.zen.minimum_master_nodes: 2
gateway.expected_master_nodes: 3
gateway.expected_data_nodes: 10
gateway.recover_after_time: 10m
gateway.recover_after_master_nodes: 2
gateway.recover_after_data_nodes: 8
  1. At least 2 masters will have to come up before anything happens, since a master has to be elected before we do anything (controlled by the minimum_master_nodes setting)
  2. If all the nodes come up within 10 minutes, then there will be no recovery. Is this the right way of thinking about the expected_* settings?
  3. If not all nodes come up within 10 minutes, then recovery will start after at least 2 masters and 8 data nodes are up. I've set master nodes to 2 since this is enough for quorum and data nodes to 8 since this is the maximum nodes which can be down which guarantees that all the data will be in either a primary or a replica shard. Is that the right way of thinking about the recover_after_* settings?

What happens if I set the recover_after_* settings to values lower than these? (eg recover after master = 1 and data = 4). Will that result in unallocated shards which will need to be forced for reallocation (with potential data loss?)

Sorry if these are answered in the docs, I could only find this reference which is not 100% clear to me (maybe an example would help :slight_smile: ). Feel free to point me to other docs if available.


Technically, "recovery" is the name for the normal process of starting up a shard, whether on creation or after a restart or a failure, so in that sense recovery always happens. It is, however, generally quicker to recover an existing shard copy rather than making a new replica from scratch, and by waiting for all the nodes to join the cluster you are raising the chances that Elasticsearch will re-use the existing copies where it can, but it is not guaranteed and Elasticsearch may decide to allocate replicas elsewhere. Version 7.5.0 includes some improvements in this area and is better at re-using existing data than earlier versions.

The rest of your understanding looks correct.

I think there is no need for gateway.expected_master_nodes or gateway.recover_after_master_nodes in your cluster. Nothing can happen until there are ≥2 masters anyway (so gateway.recover_after_master_nodes: 2 does nothing) and once 2 masters have joined there is little point in waiting for the third.

With number_of_replicas: 2 if you set recover_after_data_nodes lower than 8 then some shards may be completely unavailable indeed. You should not force their allocation, however, since this will irrecoverably lose their data. It is better to leave them alone until the remaining nodes come back up - at least this way you can search the data that's there. In the worst case, you can restore the lost data from a snapshot.

Note that these settings only delay the start of the cluster recovery process. Even if you set recover_after_data_nodes: 8 then there may be a period of time where the shards are still being assigned and the cluster health is red. To wait for the end of the recovery process, use the cluster health API with the ?wait_for_status=green option.

Thank you so much for your response, this definitely clarified things.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.