Master election takes minutes

jacknl · May 6, 2021, 1:25pm

Recently I upgraded my 3 master-eligible nodes from Ubuntu 16.04 to 18.04. The master changed twice during this procedure. I noticed that these two master elections took about 1 minute and 4 minutes, respectively. This is longer than I expected.

Does anyone know why it is taking this long? Previously it was only a few seconds.

My master configuration looks like this for master0 (the other two are similar):

node.name: master0
node.data: false
node.master: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
network.host:
  - _site_
  - _local_
discovery.zen.ping.unicast.hosts: ["data3.example.com", "data4.example.com", "data5.example.com","master0.example.com", "master1.example.com", "master2.example.com" ]
cluster.initial_master_nodes:
  - master0
  - master1
  - master2
discovery.zen.minimum_master_nodes: 2
xpack.security.enabled: false
xpack.watcher.enabled: false
xpack.ml.enabled: false
xpack.monitoring.collection.enabled: true

Here is some additional information and my own observations:

I am using Elasticsearch 7.5.2. Of course one could suggest to upgrade first but I'm trying to understand what goes wrong before I do.
I still use discovery.zen.ping.unicast.hosts and discovery.zen.minimum_master_nodes from a previous Elasticsearch 6.7 upgrade. These settings are deprecated but are mapped to discovery.seed_hosts and ignored, respectively. Seems to me that these should not play a role in this.
The discovery.zen.ping.unicast.hosts also lists a couple of data nodes. I understand that on Elasticsearch 7.x these are ignored for discovery. Should not play a role either, I guess.
I left cluster.initial_master_nodes in, even though it is only required when the cluster has not been formed yet. I assumed it's ignored then.
My master nodes are using DHCP (not my call) but fortunately the DHCP server gives them a fixed IP address. Ubuntu and Debian use a trick for such hosts where the hostname resolves to IP address 127.0.1.1 in /etc/hosts. But Elasticsearch is not listening on this IP address because I configured network.host: _local_. The documentation says that _local_ means: " Any loopback addresses on the system, for example 127.0.0.1". You could interpret this to include 127.0.1.1 too. Could this be an issue? Master1.log mentions something about discovery using 127.0.1.1:9300. Perhaps I should change the Elasticsearch config to use network.host: 0.0.0.0 instead?
Master nodes are running on VMware with 4 vCPUs and 16 GB RAM. I don't know about the exact storage but assume that it is host attached. Data nodes are dedicated hardware with NVME SSDs.

I have included logs of the 3 master nodes below. What I did was:

Stopped Elasticsearch on master2. Master0 remained master. I upgraded the OS of master2 at 13:54:46, rebooted and started Elasticsearch.
Stopped Elasticsearch on master0 at 14:29:07. Master1 appears to be master at 14:30:16. I upgraded the OS of master0, rebooted and started Elasticsearch.
Stopped Elasticsearch on master1 at 15:39:23. After a few minutes a new master was still not elected so I started master1 again at 15:43:11. Master2 appears to be master at 15:44:29. I then stopped Elasticsearch on master1 again, upgraded its OS, rebooted and started Elasticsearch.

gist.github.com

https://gist.github.com/jack-nl/63ca50465b76e8b238a25d7dc6a66ef6

master0.log

[2021-05-03T13:54:46,892][INFO ][o.e.c.s.MasterService    ] [master0] node-left[{master2}{RBMXwoyITeqpwJlY7VFFhA}{h3z8mbPFRhKjI0caC1rtZg}{192.168.0.12}{192.168.0.12:9300}{im}{xpack.installed=true} reason: disconnected], term: 38, version: 2131725, delta: removed {{master2}{RBMXwoyITeqpwJlY7VFFhA}{h3z8mbPFRhKjI0caC1rtZg}{192.168.0.12}{192.168.0.12:9300}{im}{xpack.installed=true}}
[2021-05-03T13:54:47,194][INFO ][o.e.c.s.ClusterApplierService] [master0] removed {{master2}{RBMXwoyITeqpwJlY7VFFhA}{h3z8mbPFRhKjI0caC1rtZg}{192.168.0.12}{192.168.0.12:9300}{im}{xpack.installed=true}}, term: 38, version: 2131725, reason: Publication{term=38, version=2131725}
[2021-05-03T14:11:06,820][INFO ][o.e.c.s.MasterService    ] [master0] node-join[{master2}{RBMXwoyITeqpwJlY7VFFhA}{a7xhJcj7R46HQAm3OcmNJw}{192.168.0.12}{192.168.0.12:9300}{im}{xpack.installed=true} join existing leader], term: 38, version: 2131737, delta: added {{master2}{RBMXwoyITeqpwJlY7VFFhA}{a7xhJcj7R46HQAm3OcmNJw}{192.168.0.12}{192.168.0.12:9300}{im}{xpack.installed=true}}
[2021-05-03T14:11:08,523][INFO ][o.e.c.s.ClusterApplierService] [master0] added {{master2}{RBMXwoyITeqpwJlY7VFFhA}{a7xhJcj7R46HQAm3OcmNJw}{192.168.0.12}{192.168.0.12:9300}{im}{xpack.installed=true}}, term: 38, version: 2131737, reason: Publication{term=38, version=2131737}
[2021-05-03T14:29:07,705][INFO ][o.e.n.Node               ] [master0] stopping ...
[2021-05-03T14:29:07,861][ERROR][i.n.u.c.D.rejectedExecution] [master0] Failed to submit a listener notification task. Event loop shut down?
java.util.concurrent.RejectedExecutionException: event executor terminated
	at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:987) ~[netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:388) ~[netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:381) ~[netty-common-4.1.43.Final.jar:4.1.43.Final]

This file has been truncated. show original

gist.github.com

https://gist.github.com/jack-nl/972e202e6e1d31b6a7fe69159b0c4d72

master1.log

[2021-05-03T13:54:46,922][INFO ][o.e.c.s.ClusterApplierService] [master1] removed {{master2}{RBMXwoyITeqpwJlY7VFFhA}{h3z8mbPFRhKjI0caC1rtZg}{192.168.0.12}{192.168.0.12:9300}{im}{xpack.installed=true}}, term: 38, version: 2131725, reason: ApplyCommitRequest{term=38, version=2131725, sourceNode={master0}{xexrc0uXQUWgwFDWEpAouA}{VYK7ojdqQU6DH676OtMWyA}{192.168.0.10}{192.168.0.10:9300}{im}{xpack.installed=true}}
[2021-05-03T14:11:07,057][INFO ][o.e.c.s.ClusterApplierService] [master1] added {{master2}{RBMXwoyITeqpwJlY7VFFhA}{a7xhJcj7R46HQAm3OcmNJw}{192.168.0.12}{192.168.0.12:9300}{im}{xpack.installed=true}}, term: 38, version: 2131737, reason: ApplyCommitRequest{term=38, version=2131737, sourceNode={master0}{xexrc0uXQUWgwFDWEpAouA}{VYK7ojdqQU6DH676OtMWyA}{192.168.0.10}{192.168.0.10:9300}{im}{xpack.installed=true}}
[2021-05-03T14:29:07,886][INFO ][o.e.c.c.Coordinator      ] [master1] master node [{master0}{xexrc0uXQUWgwFDWEpAouA}{VYK7ojdqQU6DH676OtMWyA}{192.168.0.10}{192.168.0.10:9300}{im}{xpack.installed=true}] failed, restarting discovery
org.elasticsearch.transport.NodeDisconnectedException: [master0][192.168.0.10:9300][disconnected] disconnected
[2021-05-03T14:29:07,890][INFO ][o.e.c.s.ClusterApplierService] [master1] master node changed {previous [{master0}{xexrc0uXQUWgwFDWEpAouA}{VYK7ojdqQU6DH676OtMWyA}{192.168.0.10}{192.168.0.10:9300}{im}{xpack.installed=true}], current []}, term: 38, version: 2131738, reason: becoming candidate: onLeaderFailure
[2021-05-03T14:29:07,898][WARN ][o.e.c.NodeConnectionsService] [master1] failed to connect to {master0}{xexrc0uXQUWgwFDWEpAouA}{VYK7ojdqQU6DH676OtMWyA}{192.168.0.10}{192.168.0.10:9300}{im}{xpack.installed=true} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [master0][192.168.0.10:9300] connect_exception
	at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:989) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$3(ActionListener.java:162) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:42) ~[elasticsearch-core-7.5.2.jar:7.5.2]

This file has been truncated. show original

gist.github.com

https://gist.github.com/jack-nl/ad76a4172508de109e0aaa4882228304

master2.log

[2021-05-03T13:54:46,162][INFO ][o.e.n.Node               ] [master2] stopping ...
[2021-05-03T13:54:46,198][INFO ][o.e.c.c.Coordinator      ] [master2] master node [{master0}{xexrc0uXQUWgwFDWEpAouA}{VYK7ojdqQU6DH676OtMWyA}{192.168.0.10}{192.168.0.10:9300}{im}{xpack.installed=true}] failed, restarting discovery
org.elasticsearch.transport.NodeDisconnectedException: [master0][192.168.0.10:9300][disconnected] disconnected
[2021-05-03T13:54:46,210][WARN ][o.e.d.HandshakingTransportAddressConnector] [master2] handshake failed for [connectToRemoteMasterNode[192.168.0.10:9300]]
org.elasticsearch.transport.SendRequestTransportException: [][192.168.0.10:9300][internal:transport/handshake]
	at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:704) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:602) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.transport.TransportService.handshake(TransportService.java:453) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.transport.TransportService.handshake(TransportService.java:431) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.discovery.HandshakingTransportAddressConnector$1$1.onResponse(HandshakingTransportAddressConnector.java:94) ~[elasticsearch-7.5.2.jar:7.5.2]

This file has been truncated. show original

Thanks for reading to the end!

DavidTurner · May 6, 2021, 2:40pm

[2021-05-03T14:49:06,641][WARN ][o.e.g.IncrementalClusterStateWriter] [master0] writing cluster state took [48023ms] which is above the warn threshold of [10s]; wrote metadata for [4361] indices and skipped [0] unchanged indices

This message indicates two problems:

you have over 4000 indices
writing a few kB of metadata for those indices took nearly a minute

I suggest upgrading to pick up #50907 which will streamline things a bit, but the fundamental problem seems to be that you have too many indices and the disks on your master nodes are too slow.

jacknl · May 7, 2021, 9:36am

Thanks, David. It's really appreciated. I had read your previous post but wasn't quite sure if it was a similar situation.

Do you still suggest increasing cluster.publish.timeout and/or cluster.join.timeout as a temporary workaround until we upgrade?

I was aware that we have a suboptimal number of shards for several of our indices. But I did not realise that the number of indices has this much of an impact on writing the cluster state to disk.

I am a bit surprised that writing a few kB per index would take this much time and that my disks are too slow. Especially because the documentation says that "cluster state updates are typically published as diffs to the previous cluster state".

DavidTurner · May 7, 2021, 1:10pm

Yes your situation sounds similar to that older post, increasing those timeouts might help a bit.

It's true that cluster state updates are typically published as diffs, but the first update after an election is not typical (it's more likely it cannot use the diff mechanism) and anyway the issue isn't in how the update is published, it's how the new state is written to disk. To avoid some subtle failure cases we have to re-write everything after an election.

system · June 4, 2021, 1:10pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Master election takes too long Elasticsearch	20	1953	May 16, 2019
New nodes do not consistently find existing master Elasticsearch	2	260	July 6, 2017
It takes up to 10 minutes to join new master Elasticsearch	5	774	September 9, 2019
How long does it usually take for another master candidate to be elected? Elasticsearch	2	222	March 30, 2022
Gracefully trigger re-election of master node Elasticsearch	7	5475	April 20, 2017

Master election takes minutes

Related topics