Hi,
I am trying to set some sane values on the gateway recovery settings and wanted to make sure that I am understanding these correctly.
Assuming that I have a cluster with 3 master-eligible nodes and 10 data nodes with a replication factor of 3 (2 replicas of each primary).
I want to make sure I understand these settings, so these are the assumptions I am making I after a full cluster restart when using these settings:
discovery.zen.minimum_master_nodes: 2
gateway.expected_master_nodes: 3
gateway.expected_data_nodes: 10
gateway.recover_after_time: 10m
gateway.recover_after_master_nodes: 2
gateway.recover_after_data_nodes: 8
- At least 2 masters will have to come up before anything happens, since a master has to be elected before we do anything (controlled by the
minimum_master_nodessetting) - If all the nodes come up within 10 minutes, then there will be no recovery. Is this the right way of thinking about the
expected_*settings? - If not all nodes come up within 10 minutes, then recovery will start after at least 2 masters and 8 data nodes are up. I've set master nodes to 2 since this is enough for quorum and data nodes to 8 since this is the maximum nodes which can be down which guarantees that all the data will be in either a primary or a replica shard. Is that the right way of thinking about the
recover_after_*settings?
What happens if I set the recover_after_* settings to values lower than these? (eg recover after master = 1 and data = 4). Will that result in unallocated shards which will need to be forced for reallocation (with potential data loss?)
Sorry if these are answered in the docs, I could only find this reference https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-gateway.html which is not 100% clear to me (maybe an example would help
). Feel free to point me to other docs if available.
Thanks,
Giorgos