Can not elect master when restarting cluster from 7.3 upgrade

Hi team,
I have an 8 node ES cluster in dev.
I shut it all down, upgraded to 7.3.1
Restarted, and no I can not get a master elected :frowning:
Here is a sample of the log entry (similar appear on all nodes)

 [2019-09-03T11:59:22,972][WARN ][o.e.c.c.ClusterFormationFailureHelper] [hdp15] master not discovered or elected yet, an election requires at least 4 nodes with ids from [3-dBP8WZQZO-Rymo0vC3mQ, EYrLIFzETzSwcUvvabPegg, jguIm07bTXC_8ot6fYN9bw, 1cT_ZkvzTHaSyiqDEi2GSw, 3BP0A2-JTTS-Vp_1YdnJrw, vp_SiTAYQyCE1TQVIUXx2A, bkVjp2mNSk6G0PH9QFBu9Q], have discovered [{hdp15}{vp_SiTAYQyCE1TQVIUXx2A}{oNdN1WTWTiGHCAvCbTsgTQ}{192.168.1.215}{192.168.1.215:9300}{dim}{ml.machine_memory=135445331968, xpack.installed=true, ml.max_open_jobs=20}, {hdp13}{SaA74YaMSNim17koypZiFQ}{iXExeBAEQgqlfUKHior14w}{192.168.1.213}{192.168.1.213:9300}{dim}{ml.machine_memory=135446564864, ml.max_open_jobs=20, xpack.installed=true}, {hdp16}{3BP0A2-JTTS-Vp_1YdnJrw}{b8hcsCHnQP-JBy4mnkmOmQ}{192.168.1.216}{192.168.1.216:9300}{dim}{ml.machine_memory=135446564864, ml.max_open_jobs=20, xpack.installed=true}, {hdp14}{bkVjp2mNSk6G0PH9QFBu9Q}{ZEui_lg3TKusPKjrglKGcQ}{192.168.1.214}{192.168.1.214:9300}{dim}{ml.machine_memory=135446564864, ml.max_open_jobs=20, xpack.installed=true}] which is not a quorum; discovery will continue using [192.168.1.213:9300, 192.168.1.214:9300, 192.168.1.216:9300, 192.168.1.223:9302, 192.168.1.224:9302, 192.168.1.225:9302, 192.168.1.226:9302] from hosts providers and [{hdp15}{vp_SiTAYQyCE1TQVIUXx2A}{oNdN1WTWTiGHCAvCbTsgTQ}{192.168.1.215}{192.168.1.215:9300}{dim}{ml.machine_memory=135445331968, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 1785, last-accepted version 96 in term 1584

It seems it might be confused by the previous state of the cluster? I am happy to delete old cluster settings and restart if that is possible?

1 Like

The message says that this node, hdp15, can't find enough of the previous master nodes to hold an election. Showing just the important bits:

an election requires at least 4 nodes with ids from [3-dBP8WZQZO-Rymo0vC3mQ, EYrLIFzETzSwcUvvabPegg, jguIm07bTXC_8ot6fYN9bw, 1cT_ZkvzTHaSyiqDEi2GSw, 3BP0A2-JTTS-Vp_1YdnJrw, vp_SiTAYQyCE1TQVIUXx2A, bkVjp2mNSk6G0PH9QFBu9Q], have discovered [{hdp15}{vp_SiTAYQyCE1TQVIUXx2A}, {hdp13}{SaA74YaMSNim17koypZiFQ}, {hdp16}{3BP0A2-JTTS-Vp_1YdnJrw}, {hdp14}{bkVjp2mNSk6G0PH9QFBu9Q}] which is not a quorum

It's found three of the nodes it needs (hdp14, hdp15 and hdp16) so it just needs one more.

Hmmm. Taking this down another level, because at this time I only have 4 nodes active (namely hdp13, 14, 15 & 16) what are the UIDs and is this where the problem might be?

an election requires at least 4 nodes with ids from [
3-dBP8WZQZO-Rymo0vC3mQ,
EYrLIFzETzSwcUvvabPegg,
jguIm07bTXC_8ot6fYN9bw,
1cT_ZkvzTHaSyiqDEi2GSw,
3BP0A2-JTTS-Vp_1YdnJrw,
vp_SiTAYQyCE1TQVIUXx2A,
bkVjp2mNSk6G0PH9QFBu9Q]

What are these IDs? Where did they come from?

have discovered [
{hdp15}{vp_SiTAYQyCE1TQVIUXx2A},
{hdp13}{SaA74YaMSNim17koypZiFQ},
{hdp16}{3BP0A2-JTTS-Vp_1YdnJrw},
{hdp14}{bkVjp2mNSk6G0PH9QFBu9Q}]

These are the only four nodes active, and why aren't they in the list of nodes for election?
Thanks

Originally you said you had 8 nodes:

But now you say you have stopped half of them:

This is expected not to work. Quoting the docs:

... if you shut down half or more of the master-eligible nodes all at the same time then the cluster will normally become unavailable. If this happens then you can bring the cluster back online by starting the removed nodes again.

Sorry about the confusion. I have restarted the remaining 4 nodes. All 8 are now running.

Still doesn't work, but the error has changed. I have configured hdp13 to be the master in the cluster.initial_master_nodes:

Example error from hdp13:

[2019-09-03T14:52:52,845][INFO ][o.e.c.c.JoinHelper ] [hdp13] failed to join {hdp14}{bkVjp2mNSk6G0PH9QFBu9Q}{tfqd4mUsR-26x9ymImqa5w}{192.168.1.214}{192.168.1.214:9300}{dim}{ml.machine_memory=135446564864, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={hdp13}{SaA74YaMSNim17koypZiFQ}{4dRaECP_RpqlvzhqIQg02A}{192.168.1.213}{192.168.1.213:9300}{dim}{ml.machine_memory=135446564864, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional.empty}
org.elasticsearch.transport.RemoteTransportException: [hdp14][192.168.1.214:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.cluster.coordination.FailedToCommitClusterStateException: node is no longer master for term 1950 while handling publication
at org.elasticsearch.cluster.coordination.Coordinator.publish(Coordinator.java:1018) ~[elasticsearch-7.3.1.jar:7.3.1]
at org.elasticsearch.cluster.service.MasterService.publish(MasterService.java:252) ~[elasticsearch-7.3.1.jar:7.3.1]
at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:238) ~[elasticsearch-7.3.1.jar:7.3.1]
at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142) ~[elasticsearch-7.3.1.jar:7.3.1]
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) ~[elasticsearch-7.3.1.jar:7.3.1]
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) ~[elasticsearch-7.3.1.jar:7.3.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688) ~[elasticsearch-7.3.1.jar:7.3.1]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) ~[elasticsearch-7.3.1.jar:7.3.1]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) ~[elasticsearch-7.3.1.jar:7.3.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]

I notice that I have a mix of IP and hostname, and when I show a node I get this for example:

{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "es_cluster",
"nodes" : {
"SaA74YaMSNim17koypZiFQ" : {
"name" : "hdp13",
"transport_address" : "192.168.1.213:9300",
"host" : "192.168.1.213",
"ip" : "192.168.1.213",
"version" : "7.3.1",
"build_flavor" : "default",
"build_type" : "rpm",
"build_hash" : "4749ba6",
"roles" : [
"ingest",
"master",
"data"
],

There is name, host and IP - are these ok?

This setting is ignored, because this cluster has already started.

The message you quote isn't in itself an error - it's something that can happen (particularly if you have more master-eligible nodes than the recommended 3) but should resolve itself eventually when Elasticsearch retries. Can you share more logs? Ideally a few minutes' worth from each node, maybe using https://gist.github.com since there'll be too much to share here.

Thanks. Hmm. It looks like nodes are all triggering elections that conflict with each other, which is something that can happen with too many master-eligible nodes, but I've not seen it take so long to sort itself out even with 8 of them.

Can you set logger.org.elasticsearch.cluster.coordination: TRACE in elasticsearch.yml and then restart this node? This will produce a lot of logs, but will hopefully shed more light on why the elections are not converging. After a few minutes you can remove that setting and restart the node again to get back to normal log levels.

Thanks, I found this message in these logs:

[2019-09-03T15:42:02,253][INFO ][o.e.c.c.JoinHelper       ] [hdp13] failed to join {hdp16}{3BP0A2-JTTS-Vp_1YdnJrw}{fdSyGsDPTvCGSjYK5kSIlw}{192.168.1.216}{192.168.1.216:9300}{dim}{ml.machine_memory=135446564864, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={hdp13}{SaA74YaMSNim17koyp
org.elasticsearch.transport.RemoteTransportException: [hdp16][192.168.1.216:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.cluster.coordination.FailedToCommitClusterStateException: publication failed
  at org.elasticsearch.cluster.coordination.Coordinator$CoordinatorPublication$3.onFailure(Coordinator.java:1378) ~[elasticsearch-7.3.1.jar:7.3.1]
  at org.elasticsearch.common.util.concurrent.ListenableFuture$1.run(ListenableFuture.java:101) ~[elasticsearch-7.3.1.jar:7.3.1]
  at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:193) ~[elasticsearch-7.3.1.jar:7.3.1]
  at org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:92) ~[elasticsearch-7.3.1.jar:7.3.1]
  at org.elasticsearch.common.util.concurrent.ListenableFuture.addListener(ListenableFuture.java:54) ~[elasticsearch-7.3.1.jar:7.3.1]
  at org.elasticsearch.cluster.coordination.Coordinator$CoordinatorPublication.onCompletion(Coordinator.java:1302) ~[elasticsearch-7.3.1.jar:7.3.1]
  at org.elasticsearch.cluster.coordination.Publication.onPossibleCompletion(Publication.java:124) ~[elasticsearch-7.3.1.jar:7.3.1]
  at org.elasticsearch.cluster.coordination.Publication.cancel(Publication.java:88) ~[elasticsearch-7.3.1.jar:7.3.1]
  at org.elasticsearch.cluster.coordination.Coordinator$CoordinatorPublication$2.run(Coordinator.java:1269) ~[elasticsearch-7.3.1.jar:7.3.1]
  at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688) ~[elasticsearch-7.3.1.jar:7.3.1]
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
  at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: org.elasticsearch.ElasticsearchException: publication cancelled before committing: timed out after 30s
  at org.elasticsearch.cluster.coordination.Publication.cancel(Publication.java:85) ~[elasticsearch-7.3.1.jar:7.3.1]
  at org.elasticsearch.cluster.coordination.Coordinator$CoordinatorPublication$2.run(Coordinator.java:1269) ~[elasticsearch-7.3.1.jar:7.3.1]
  at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688) ~[elasticsearch-7.3.1.jar:7.3.1]
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
  at java.lang.Thread.run(Thread.java:835) ~[?:?]

specifically publication cancelled before committing: timed out after 30s. Digging further, I see this:

[2019-09-03T15:41:32,310][TRACE][o.e.c.c.CoordinationState] [hdp13] handlePublishRequest: accepting publish request for version [155] and term [5290]
...
[2019-09-03T15:42:08,779][TRACE][o.e.c.c.Coordinator      ] [hdp13] handlePublishRequest: coordinator remaining FOLLOWER of [{hdp16}{3BP0A2-JTTS-Vp_1YdnJrw}{fdSyGsDPTvCGSjYK5kSIlw}{192.168.1.216}{192.168.1.216:9300}{dim}{ml.machine_memory=135446564864, ml.max_open_jobs=20, xpack.installed=true}] in term 529

These two messages are written by the thread processing the cluster state update on this node and show that it took ~36 seconds to write the new cluster state to disk, which is far too long for the cluster to form successfully. What kind of storage are you using? How many indices are there in this cluster?

There are 2 instances of ES running per host. They are big powerful boxes - 4 of them so 8 ES nodes in total.
There are 8 spinning disks on each host, 4 per ES node.
I don't know the number indices there are but it is quite a large number.
Without the cluster running how do I find the number of indices?

Ok, I think that explains it. The exact number isn't too important. The first cluster state update after an election requires each master-eligible node to write one (small) file per index, and on spinning disks it's not unusual for it to take 30ms to write each file, so if you've 1000 indices then that'd be 30 seconds.

Is there any scope for moving to dedicated master nodes with faster storage?

In the meantime, you can increase this timeout by setting cluster.join.timeout to a longer time. It defaults to 60 seconds, but at the speeds we're seeing here you might want to try 120 seconds instead.

Do you have the disks all mounted separately, at different data paths? If so, it's one file per index per data path, so 250 indices would be enough to exceed 30 seconds: 250 indices * 4 data paths * 30ms == 30 seconds.

Yes I do have each disk mounted on a separate data path, so it will be the 250 limit!
Just to try and get the cluster up again I have added cluster.join.timeout: 180s just to see if it will start!

I will then look at tuning and sorting this out - if I can only get it up and running again

Still stuck. Not seeing as many log entries now, but consistently seeing this type of message on all ES nodes:

[2019-09-03T17:02:48,304][WARN ][o.e.c.c.ClusterFormationFailureHelper] [hdp13] master not discovered or elected yet, an election requires at least 4 nodes with ids from [3-dBP8WZQZO-Rymo0vC3mQ, EYrLIFzETzSwcUvvabPegg, jguIm07bTXC_8ot6fYN9bw, 1cT_ZkvzTHaSyiqDEi2GSw, 3BP0A2-JTTS-Vp_1YdnJrw, vp_SiTAYQyCE1TQVIUXx2A, bkVjp2mNSk6G0PH9QFBu9Q], have discovered [{hdp13}{SaA74YaMSNim17koypZiFQ}{OzxQExfXTXSu3qIsj_dGwQ}{192.168.1.213}{192.168.1.213:9300}{dim}{ml.machine_memory=135446564864, xpack.installed=true, ml.max_open_jobs=20}, {hdp15}{vp_SiTAYQyCE1TQVIUXx2A}{lPkmZqQ0TzinbEoK5SnmIw}{192.168.1.215}{192.168.1.215:9300}{dim}{ml.machine_memory=135445331968, ml.max_open_jobs=20, xpack.installed=true}, {hdp14}{bkVjp2mNSk6G0PH9QFBu9Q}{Nf0FW8gMQimBGGTn5-bzeA}{192.168.1.214}{192.168.1.214:9300}{dim}{ml.machine_memory=135446564864, ml.max_open_jobs=20, xpack.installed=true}, {hdp16}{3BP0A2-JTTS-Vp_1YdnJrw}{eUArLM8cSb2QIGAUEMChzw}{192.168.1.216}{192.168.1.216:9300}{dim}{ml.machine_memory=135446564864, ml.max_open_jobs=20, xpack.installed=true}] which is not a quorum; discovery will continue using [192.168.1.214:9300, 192.168.1.215:9300, 192.168.1.216:9300, 192.168.1.223:9302, 192.168.1.224:9302, 192.168.1.225:9302, 192.168.1.226:9302] from hosts providers and [{hdp13}{SaA74YaMSNim17koypZiFQ}{OzxQExfXTXSu3qIsj_dGwQ}{192.168.1.213}{192.168.1.213:9300}{dim}{ml.machine_memory=135446564864, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 6128, last-accepted version 172 in term 6029

The bit that seems odd is the which is not a quorum yet all nodes seem to be there?

Shall I give up and go back to 6.x as that was working?

No, I only see 4 of the 8 nodes listed in that message. You need at least 5 of your 8 original nodes to get the cluster back up, as mentioned here:

Hey,

I am just asking, inside your elasticsearch.yml, have you set your cluster.initial_master_nodes: on each of your master nodes ?

On each master node,
cluster.initial_master_nodes:

  • Node-1
  • Node-2

@NewmazN24 all these nodes were part of a cluster prior to the restart, so that setting has no effect. It's only used when forming a brand-new cluster for the first time.

Finally got it all working! Thanks for your help @DavidTurner
Then we had a power outage last night so I had to wait and see if it recovered!! It did.

The major change was to increase two timeouts to account for my poor disk performance:

cluster.join.timeout: 180s
cluster.publish.timeout: 180s

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.