Is it possible to "disable" the node without shutting it down completely?

Victor_S · August 28, 2018, 1:25pm

(and to re-enable it again later without restarting the process?)
so the master node will promote replica shards from an another node to primary ones, node_left timer will be started, and so on?

Bernt_Rostad · August 28, 2018, 2:10pm

This happens automatically whenever a node falls out of the cluster, then replica shards on the other nodes will get promoted to primary shards to replace the primaries on the node that fell out. When the node returns to the cluster it will only contain replica shards.

What I don't understand is why you want to "move" the primary shards away from one specific node. This is kind of pointless since the replica shards also have to be updated, whenever a primary is, and usually will get as many search requests to process as the primary. And even if you "move" all primary shards to other nodes they will trickle back when other nodes fall out of the cluster, which will happen, or when a new index is created in the cluster.

Victor_S · August 28, 2018, 2:41pm

Actually, I'm planning a rolling upgrade, but I need a way to gracefully kick off the node out of the cluster first, before sending it SIGTERM that may take longer to process if the node is still in the cluster (e.g. because of pending index requests) and may cause killing the process by the underlying infrastructure after a timeout, which is not desirable.
There's software under the hood allowing to implement two-phase shutdown procedure, and the first phase is supposed to be asynchronous, i.e. send a "graceful shutdown" request, wait until all background tasks on that node are finished (e.g. by periodically polling the node), and only then send SIGTERM (which should be handled much faster this time).

Bernt_Rostad · August 28, 2018, 2:50pm

I've implemented a controller to do something similar; it sends a SIGTERM (kill -15 ) to the running Elasticsearch process and waits for it to shut down gracefully. It uses SIGKILL (kill -9 ) only for emergency stops or if the SIGTERM fails to stop the process within a given time limit. The controller works equally well whether the node has just replica shards or primary shards.

Victor_S · August 28, 2018, 3:16pm

I have a similar thing, but I'd like to avoid killing the process whenever possible, so I need a way to ensure that SIGTERM will be always handled as fast as possible.

system · September 25, 2018, 3:16pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shut down master-data node Elasticsearch	3	344	October 4, 2020
Stop-start an elasticsearch instance having all the primary shards Elasticsearch	14	947	March 19, 2020
Replace an ES node Elasticsearch	7	36	August 29, 2024
How to except primary shard to specific node? Elasticsearch	2	411	July 24, 2018
Stopping the entire cluster without any rebalancing Elasticsearch	11	1698	July 6, 2017

Is it possible to "disable" the node without shutting it down completely?

Related topics