Gracefully trigger re-election of master node

munnerz · March 22, 2017, 1:02pm

I'm currently working on some automation for running Elasticsearch in a clustered fashion on top of Kubernetes, and would love to be able to manually trigger a master re-election (or alternatively, disallow the current master from being master, similar to setting cluster.routing.allocation.exclude. Right now upon a scale down event involving the master node, the cluster can turn red for up to 30s (thus serving no requests).

This downtime can, and should really be avoided and so far it seems there's no graceful way to do so. This is a request/thread to open discussion on how this can be implemented, coming out of the GitHub issue here: https://github.com/elastic/elasticsearch/issues/17493

Thanks!

bleskes · March 22, 2017, 1:08pm

Thanks.

My first question is about this ^^ . Why 30 seconds? this should be around ~3s under normal operations.

munnerz · March 22, 2017, 1:12pm

I've not dug into why the time was around 30s, but I'd imagine it could be related to the discovery plugin in use?

Specifically, I'm using the fabric8 discovery plugin here: https://github.com/fabric8io/elasticsearch-cloud-kubernetes

I have attempted to use the DNS/ping discovery, but ran into other separate issues with that (unrelated to this particular issue).

Some of our users would find any period of downtime like this unacceptable, so I feel like even reducing this time is not a true solution to this feature request

bleskes · March 22, 2017, 1:16pm

While we do have plans to speed up the 3s for the case where the master left and all nodes respond promptly these are not coming soon (it's non trivial to say the least). That said, I think we should clarify what those 3s mean - by default search and gets will be served fine. Indexing operations will wait until a new master is elected and proceed as before - no request should be rejected. Are you seeing something else?

PS - you should find out why election takes 30s - it's indicative of something else that's wrong.

munnerz · March 22, 2017, 1:18pm

Ah okay, thanks.

I've not done particularly thorough testing of what does & does not work during this period, I have been using Kibana to monitor my cluster and it goes all red during this time, hence I assumed that the majority of cluster operations were not functional.

I'll run some tests now to determine exactly how long it goes unavailable for, and what is unavailable and get back to you. Is the re-election timeout configurable from 3s? (I ask so I can check to see if mine has been set to anything other than 3s!)

bleskes · March 22, 2017, 1:22pm

I've not done particularly thorough testing of what does & does not work during this period, I have been using Kibana to monitor my cluster

Yes, losing a master does make the cluster go red during election. It's not a "lite" event ...

I assumed that the majority of cluster operations were not functional.

All operations should either be served or wait for a new master to be elected and timeout with a reasonable timeout (30s for master level operations like creating an index , 60s for indexing).

Is the re-election timeout configurable from 3s?

The settings is discovery.zen.ping_timeout. See here.

lukas_vlcek · March 23, 2017, 5:20pm

If you can track this issue down to K8s ES plugin responsibility have you considered opening a ticket for it?

Btw, would you mind sharing more details about your ES version and K8s plugin version?

system · April 20, 2017, 5:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How about Force change cluster's elected master node Elasticsearch	7	7567	October 9, 2019
Elasticsearch manual election Elasticsearch	3	439	February 7, 2019
It takes up to 10 minutes to join new master Elasticsearch	5	774	September 9, 2019
Restart the master node cause 3 seconds unavailable for query and indexing Elasticsearch	2	446	April 20, 2017
Master election takes minutes Elasticsearch	4	888	June 4, 2021

Gracefully trigger re-election of master node

Related topics