Recovery settings - Clarification on when they're used

(Chris Fraschetti) #1


Specific to the above settings, the documentation (referenced above) mainly calls them out with regards to a full cluster restart.
"The gateway module allows one to store the state of the cluster meta data across full cluster restarts."

I'm looking to confirm how these settings (or other settings) come into play with a already fully started cluster.

-Do these settings apply to the recovery policies when a node leaves/joins the cluster?

-Are these node specific - each node reads it local values and starts recovery based on its local settings?
-Is recovery coordinated from the master - in which case, are these settings read from the master's config or the data node's config?

I'm looking to validate that I can control telling my cluster there are normally N nodes and automatic recovery of a 1+ node failures should only occur after as long as N-4 nodes are still online.

I expect this is what recover_after_data_nodes and and recovery_after_time will do for me but I'm looking for clarification on that point and what setting these values on the data nodes or dedicated masters might do.

Thanks in advance!

(Mark Walkom) #2


I'm not sure how this works if you have differing settings to be honest.

Yes. Recovery is managed by the master. See previous answer on the settings and possible conflict, but I will see if I can get someone to clarify this.

(Mark Walkom) #3

The way we handle differences is we only take whatever the current master has set.

(system) #4