Rolling Restarts: node_left.delayed_timeout vs allocation.enable

What are the benefits/drawbacks to using index.unassigned.node_left.delayed_timeout vs cluster.routing.allocation.enable when doing a rolling restart?

From what I understand:

It seems as though when doing a rolling restart you'd want to increase the node_left.delayed_timeout (Default 1m) to something reasonable (5m for us) or something pretty high for manual maintenance (2h?). Yet the docs for rolling restarts recommend setting allocation enable to none (which should really be new_primaries...).

Am I missing something or over simplifying?

Thanks,
Ryan

Interestingly Azure recommends using delayed_timeout

The second is the original method of stopping reallocation in cases of node restarts. The first is probably the better way to do it now.
You may, for whatever reason, want to stop all allocation for other non-restart reasons though, so having the option makes sense.

I've raised https://github.com/elastic/elasticsearch/issues/19739 as a discussion issue :slight_smile:

1 Like