Master nodes do not detect the other masters after service restart

Hello. I upgraded my existing elasticsearch cluster from 6.8.0 to 7.2.0. My cluster has 3 master instances, and many data instances. After some fighting and googling, I eventually was able to get the masters to behave together normally only after using the cluster.initial_master_nodes setting. After that, everything seemed fine, my data was all there, no apparent problems, all three masters seemed to be logging and behaving with messages I'm used to seeing. :+1:
Until.... I went to restart one of the master services. The master service would not join the cluster again on startup, but the configuration had not changed anywhere. The error message being repeated is:

[WARN ][o.e.c.c.ClusterFormationFailureHelper] [es2-hot-1-master-01] master not discovered or elected yet, an election requires at least 2 nodes with ids from [xwDV5JDSHKeQ1DOj5MDMw, NZkcauwTbWNmNG0jeh2Yw, LxPz1cuR6aRiZI8LLn-bA], have discovered which is not a quorum; discovery will continue using [10.40.250.108:9300, 10.40.250.109:9300] from hosts providers and [{es2-hot-1-master-01}{NZkcauwTbWNmNG0jeh2Yw}{azFFLIyTC-4MqVajv98vw}{10.40.250.107}{10.40.250.107:9300}{rack_id=es2-hot-1, xpack.installed=true, box_type=hot, host_name=es2-hot-1}] from last-known cluster state; node term 13, last-accepted version 14861 in term 13

To get it to successfully join again, all I have to do is stop one of the other two master nodes. Then, it immediately successfully discovers whatever it needs to discover, and everything seems normal again.
I am quite baffled. I have tried many things, such as changing the values of discovery.zen.ping.unicast.hosts to be resolvable names that match the name.node values of the masters. I have tried using discovery.seed_hosts instead of discovery.zen.ping.unicast.hosts. I have tried putting the same resolvable names as values in the cluster.initial_master_nodes list. All these things result in the same behavior.

tl;dr: I can force an election to work with 3 masters in 7.2.0 if I use cluster.initial_master_nodes and restart a couple of the master services. But a master service restarted after this election is unable to discover anything else again until I repeat this same process.

Potentially relevant information: I run multiple elasticsearch instances on each OS, on different ports. The masters occupy ports 9200 and 9300.

Configuration I ended up with in the end in elasticsearch.yml for the masters:
---
cluster.name: es2
discovery.zen.ping.unicast.hosts:
- es2-hot-1-master-01
- es2-hot-2-master-01
- es2-hot-3-master-01
cluster.initial_master_nodes:
- es2-hot-1-master-01
- es2-hot-2-master-01
- es2-hot-3-master-01
discovery.zen.ping_timeout: 60s
http.cors.allow-origin: "/.*/"
http.cors.enabled: true
http.host: 0.0.0.0
http.max_content_length: 500mb
http.port: 9200
network.host: 10.40.250.107
node.attr.box_type: hot
node.attr.host_name: es2-hot-1
node.attr.rack_id: es2-hot-1
node.data: false
node.master: true
node.name: es2-hot-1-master-01
path.data: false
path.logs: "/var/log/elasticsearch/master-01"
thread_pool.write.queue_size: 600
xpack.graph.enabled: false
xpack.ml.enabled: false
xpack.monitoring.enabled: false
xpack.security.enabled: false
xpack.watcher.enabled: false

My /etc/hosts looks like this on my master hosts, in order to facilitate the short names in the lists:
127.0.0.1 localhost
127.0.1.1 es2-hot-1
10.40.250.107 es2-hot-1-master-01 es2-hot-1-master-01.xyz.lan
10.40.250.108 es2-hot-2-master-01 es2-hot-2-master-01.xyz.lan
10.40.250.109 es2-hot-3-master-01 es2-hot-3-master-01.xyz.lan

You shouldn't be setting cluster.initial_master_nodes in this cluster because it's already formed once, so it's probably best to remove that. This setting is only used the first time the cluster forms.

This tells us you are having a discovery problem. This node is trying to find the other two masters at the listed addresses, but is failing to do so. Either these addresses are wrong (are they?) or else there's something else. If the addresses are correct, can you try setting logger.org.elasticsearch.discovery: TRACE to trace the discovery process and share the resulting logs?

Hello David, thank you for your response.

I got my three master nodes into a happy situation, and then turned on logger.org.elasticsearch.discovery: TRACE on a master node (not the elected master) and restarted it to reproduce the symptom. The follow messages are what get repeated when this master is not able to rejoin the cluster after restart:

[2019-07-24T09:39:50,118][TRACE][o.e.d.PeerFinder         ] [es2-hot-1-master-01] probing master nodes from cluster state: nodes: 
   {es2-hot-1-master-01}{NzZkcauwTbWNmNG0jeh2Yw}{DfXWOnqQSm-4o2RkqNvflw}{10.40.250.107}{10.40.250.107:9300}{rack_id=es2-hot-1, xpack.installed=true, box_type=hot, host_name=es2-hot-1}, local
[2019-07-24T09:39:50,118][TRACE][o.e.d.PeerFinder         ] [es2-hot-1-master-01] startProbe(10.40.250.107:9300) not probing local node
[2019-07-24T09:39:50,119][TRACE][o.e.d.SeedHostsResolver  ] [es2-hot-1-master-01] resolved host [10.40.250.107] to [10.40.250.107:9300]
[2019-07-24T09:39:50,119][TRACE][o.e.d.SeedHostsResolver  ] [es2-hot-1-master-01] resolved host [10.40.250.108] to [10.40.250.108:9300]
[2019-07-24T09:39:50,119][TRACE][o.e.d.SeedHostsResolver  ] [es2-hot-1-master-01] resolved host [10.40.250.109] to [10.40.250.109:9300]
[2019-07-24T09:39:50,119][TRACE][o.e.d.PeerFinder         ] [es2-hot-1-master-01] probing resolved transport addresses [10.40.250.108:9300, 10.40.250.109:9300]
[2019-07-24T09:39:50,119][TRACE][o.e.d.PeerFinder         ] [es2-hot-1-master-01] Peer{transportAddress=10.40.250.108:9300, discoveryNode=null, peersRequestInFlight=false} attempting connection
[2019-07-24T09:39:50,119][TRACE][o.e.d.PeerFinder         ] [es2-hot-1-master-01] Peer{transportAddress=10.40.250.109:9300, discoveryNode=null, peersRequestInFlight=false} attempting connection
[2019-07-24T09:39:50,120][TRACE][o.e.d.HandshakingTransportAddressConnector] [es2-hot-1-master-01] [connectToRemoteMasterNode[10.40.250.108:9300]] opening probe connection
[2019-07-24T09:39:50,120][TRACE][o.e.d.HandshakingTransportAddressConnector] [es2-hot-1-master-01] [connectToRemoteMasterNode[10.40.250.109:9300]] opening probe connection
[2019-07-24T09:39:50,126][TRACE][o.e.d.HandshakingTransportAddressConnector] [es2-hot-1-master-01] [connectToRemoteMasterNode[10.40.250.108:9300]] opened probe connection
[2019-07-24T09:39:50,126][TRACE][o.e.d.HandshakingTransportAddressConnector] [es2-hot-1-master-01] [connectToRemoteMasterNode[10.40.250.109:9300]] opened probe connection
[2019-07-24T09:39:50,127][TRACE][o.e.d.HandshakingTransportAddressConnector] [es2-hot-1-master-01] [connectToRemoteMasterNode[10.40.250.109:9300]] handshake successful: {es2-hot-3-es-01}{VY6-FsrURQmIBb-2WzT48Q}{pY_hUI2RTqqjUXWyRctvTQ}{10.40.250.109}{10.40.250.109:9300}{rack_id=es2-hot-3, xpack.installed=true, box_type=hot, host_name=es2-hot-3}
[2019-07-24T09:39:50,127][TRACE][o.e.d.HandshakingTransportAddressConnector] [es2-hot-1-master-01] [connectToRemoteMasterNode[10.40.250.108:9300]] handshake successful: {es2-hot-2-es-01}{CzAcsT2cSLqvgFA3lxuJmA}{jP7C13JpTnSV9obRcD0u0w}{10.40.250.108}{10.40.250.108:9300}{rack_id=es2-hot-2, xpack.installed=true, box_type=hot, host_name=es2-hot-2}
[2019-07-24T09:39:50,128][DEBUG][o.e.d.PeerFinder         ] [es2-hot-1-master-01] Peer{transportAddress=10.40.250.109:9300, discoveryNode=null, peersRequestInFlight=false} connection failed
org.elasticsearch.transport.ConnectTransportException: [es2-hot-3-es-01][10.40.250.109:9300] non-master-eligible node found
	at org.elasticsearch.discovery.HandshakingTransportAddressConnector$1.doRun(HandshakingTransportAddressConnector.java:101) [elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758) [elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.2.0.jar:7.2.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:835) [?:?]
[2019-07-24T09:39:50,128][DEBUG][o.e.d.PeerFinder         ] [es2-hot-1-master-01] Peer{transportAddress=10.40.250.108:9300, discoveryNode=null, peersRequestInFlight=false} connection failed
org.elasticsearch.transport.ConnectTransportException: [es2-hot-2-es-01][10.40.250.108:9300] non-master-eligible node found
	at org.elasticsearch.discovery.HandshakingTransportAddressConnector$1.doRun(HandshakingTransportAddressConnector.java:101) [elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758) [elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.2.0.jar:7.2.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:835) [?:?]

It seems it is able to connect to 9300 on each of the other two masters, but considers them to be "non-master-eligible". What does that mean?

As soon as I restart one of the other two masters in the cluster, the issue immediately corrects itself and all three masters play normally with each other.

Furthermore, with some further testing I found the symptom and solution happen the same way regardless of if I have cluster.initial_master_nodes set or not. So, indeed, that is not actually a factor in this symptom.

Pretty weird, right?
Anyway, why do we think a master would consider its friends to be "non-master-eligible" after a simple service restart?

Hmm, yes, that's very strange. It means that the remote node is reporting that it's not master-eligible, i.e. that node.master: false. But this isn't something that can change while a node is running, so I'm deeply puzzled.

Can you share the output of GET / from each node (i.e. curl http://10.40.250.107:9200/, curl http://10.40.250.108:9200/ and curl http://10.40.250.109:9200/)? Can you also share the output of GET _nodes/plugins?

{
  "name" : "es2-hot-1-master-01",
  "cluster_name" : "es2",
  "cluster_uuid" : "Fo_GtXZiR12uY2Sozf7Cmw",
  "version" : {
    "number" : "7.2.0",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "508c38a",
    "build_date" : "2019-06-20T15:54:18.811730Z",
    "build_snapshot" : false,
    "lucene_version" : "8.0.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}
root@es2-hot-1:~# curl http://10.40.250.108:9200/
{
  "name" : "es2-hot-2-master-01",
  "cluster_name" : "es2",
  "cluster_uuid" : "Fo_GtXZiR12uY2Sozf7Cmw",
  "version" : {
    "number" : "7.2.0",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "508c38a",
    "build_date" : "2019-06-20T15:54:18.811730Z",
    "build_snapshot" : false,
    "lucene_version" : "8.0.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}
root@es2-hot-1:~# curl http://10.40.250.109:9200/
{
  "name" : "es2-hot-3-master-01",
  "cluster_name" : "es2",
  "cluster_uuid" : "Fo_GtXZiR12uY2Sozf7Cmw",
  "version" : {
    "number" : "7.2.0",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "508c38a",
    "build_date" : "2019-06-20T15:54:18.811730Z",
    "build_snapshot" : false,
    "lucene_version" : "8.0.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

The output of _nodes/plugins is very large, as it includes the plugins installed on each of the data nodes, which there are a lot of. Too large to paste in the forum, here. Is there something specific you're looking for in that output?

Ah right try GET _nodes/master:true/plugins to share just the master-eligible nodes. You can share it using https://gist.github.com if it's still too big to fit here.

_nodes/master:true/plugins here:

Thanks, that helps. Going back to my earlier post:

have discovered [] which is not a quorum; discovery will continue using [10.40.250.108:9300, 10.40.250.109:9300] from hosts providers
Either these addresses are wrong (are they?) ...

These addresses are indeed wrong:

      "transport_address": "10.40.250.108:9302",
...
      "transport_address": "10.40.250.109:9303",
...
      "transport_address": "10.40.250.107:9300",

Note the discrepancy in ports. I am guessing you're running multiple nodes on these machines and not ensuring that the masters have a fixed, known port, so it's picking the first free one above 9300 (the default behaviour) and this prevents discovery from working properly. Fix the ports of your master nodes by setting transport.port: 9300 on those nodes and transport.port: 9301-9400 on the data nodes.

David, thank you very much, that is an astute observation and indeed the cause of my problem. I was not defining transport.port: 9300 in my elasticsearch.yml for my master instances, and they would land on random ports above 9300. When I set transport.port: 9300 (and restarted other instances as appropriate to make 9300 available) my situation is solved.

This seems to be a regression? Previous versions of elasticsearch worked fine without defining transport.port: 9300. I'm happy to set it in my clusters, but this might bite others who upgrade from previous versions who also run multiple es instances per OS.

Thanks again!

It's questionable whether this is really a regression - the config here would have been broken in earlier versions too, although you'd have to try a bit harder to see that. For instance if you'd stopped all the nodes listening on port 9300 (one on each host) and then started them up again then you would have seen the same effect. And we've recommended listing the master-eligible nodes in older docs too (e.g. 5.5):

It is recommended that the unicast hosts list be maintained as the list of master-eligible nodes in the cluster.

It's perhaps unclear that by "node" we mean "address and port" here. I've opened a PR to add a note to the breaking changes docs for 7.0 to clarify this anyway:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.