Using gateway.recover_after_data_nodes to minimize recovery time in an Azure IAAS environment

casieowen · February 21, 2017, 9:08pm

Hi,

The question is: if the number of data nodes in the cluster is < than gateway.recover_after_data_nodes, will the cluster go red and/or stop accepting writes? (I can test this, I know, but it requires a full cluster restart, which takes a loooong time.)

Context/reason for asking:

We run a cluster in Azure (IAAS). We have configured update domains in Azure and theoretically should not be impacted by their updates. However, we're finding that if the amount of time between updating VMs or VM hosts is not sufficient for reallocating/rebalancing, our cluster can go red and the recovery process can take a long time because nodes are being restarted while we're accepting writes.

We're thinking of ways to mitigate this. Currently, we have gateway.recover_after_data_nodes set to n-1 (we have 11 data nodes, so it's set to 10). We're thinking that if the cluster goes red (and/or stops accepting writes) if the number of data nodes in the cluster is < than gateway.recover_after_data_nodes set in the yml, that may reduce the recovery time. Any thoughts / input on this is appreciated.

We're also working with the Azure folks focused on elasticsearch for notification of such updates, and additional options.

Thanks,
Casie

system · March 21, 2017, 9:09pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What happens if 2 of 5 data nodes are down? Elasticsearch	3	395	November 12, 2021
Gateway settings Elasticsearch	2	299	March 1, 2022
Clarification on Recovery settings please Elasticsearch	4	760	July 6, 2017
Setting cluster restart settings Elasticsearch	3	559	March 2, 2020
Gateway.recover_after_nodes question Elasticsearch	4	2038	July 6, 2017

Using gateway.recover_after_data_nodes to minimize recovery time in an Azure IAAS environment

Related topics