Hello. I upgraded my existing elasticsearch cluster from 6.8.0 to 7.2.0. My cluster has 3 master instances, and many data instances. After some fighting and googling, I eventually was able to get the masters to behave together normally only after using the cluster.initial_master_nodes
setting. After that, everything seemed fine, my data was all there, no apparent problems, all three masters seemed to be logging and behaving with messages I'm used to seeing.
Until.... I went to restart one of the master services. The master service would not join the cluster again on startup, but the configuration had not changed anywhere. The error message being repeated is:
[WARN ][o.e.c.c.ClusterFormationFailureHelper] [es2-hot-1-master-01] master not discovered or elected yet, an election requires at least 2 nodes with ids from [xwDV5JDSHKeQ1DOj5MDMw, NZkcauwTbWNmNG0jeh2Yw, LxPz1cuR6aRiZI8LLn-bA], have discovered which is not a quorum; discovery will continue using [10.40.250.108:9300, 10.40.250.109:9300] from hosts providers and [{es2-hot-1-master-01}{NZkcauwTbWNmNG0jeh2Yw}{azFFLIyTC-4MqVajv98vw}{10.40.250.107}{10.40.250.107:9300}{rack_id=es2-hot-1, xpack.installed=true, box_type=hot, host_name=es2-hot-1}] from last-known cluster state; node term 13, last-accepted version 14861 in term 13
To get it to successfully join again, all I have to do is stop one of the other two master nodes. Then, it immediately successfully discovers whatever it needs to discover, and everything seems normal again.
I am quite baffled. I have tried many things, such as changing the values of discovery.zen.ping.unicast.hosts
to be resolvable names that match the name.node
values of the masters. I have tried using discovery.seed_hosts
instead of discovery.zen.ping.unicast.hosts
. I have tried putting the same resolvable names as values in the cluster.initial_master_nodes
list. All these things result in the same behavior.
tl;dr: I can force an election to work with 3 masters in 7.2.0 if I use cluster.initial_master_nodes
and restart a couple of the master services. But a master service restarted after this election is unable to discover anything else again until I repeat this same process.
Potentially relevant information: I run multiple elasticsearch instances on each OS, on different ports. The masters occupy ports 9200 and 9300.
Configuration I ended up with in the end in elasticsearch.yml for the masters:
---
cluster.name: es2
discovery.zen.ping.unicast.hosts:
- es2-hot-1-master-01
- es2-hot-2-master-01
- es2-hot-3-master-01
cluster.initial_master_nodes:
- es2-hot-1-master-01
- es2-hot-2-master-01
- es2-hot-3-master-01
discovery.zen.ping_timeout: 60s
http.cors.allow-origin: "/.*/"
http.cors.enabled: true
http.host: 0.0.0.0
http.max_content_length: 500mb
http.port: 9200
network.host: 10.40.250.107
node.attr.box_type: hot
node.attr.host_name: es2-hot-1
node.attr.rack_id: es2-hot-1
node.data: false
node.master: true
node.name: es2-hot-1-master-01
path.data: false
path.logs: "/var/log/elasticsearch/master-01"
thread_pool.write.queue_size: 600
xpack.graph.enabled: false
xpack.ml.enabled: false
xpack.monitoring.enabled: false
xpack.security.enabled: false
xpack.watcher.enabled: false
My /etc/hosts looks like this on my master hosts, in order to facilitate the short names in the lists:
127.0.0.1 localhost
127.0.1.1 es2-hot-1
10.40.250.107 es2-hot-1-master-01 es2-hot-1-master-01.xyz.lan
10.40.250.108 es2-hot-2-master-01 es2-hot-2-master-01.xyz.lan
10.40.250.109 es2-hot-3-master-01 es2-hot-3-master-01.xyz.lan