We have a cluster of multiple nodes, of which are 3 dedicated master nodes. I wanted to do a normal rolling package (apt) update/upgrade and reboot, but when I restarted the first eligible master node, it wouldn't join the cluster after the reboot.
ES Version 7.4.2
esm01 - Master elected node
esm02 - eligible master node
esm03 - eligible master node - rebooted and wont join the existing cluster.
Config esm03:
cluster.name: cluster01
node.name: esm03
node.attr.rack: virtual
node.master: true
node.data: false
path.data: /es/data
path.logs: /es/logs
http.port: 9200
http.bind_host: X.X.1.42
transport.tcp.port: 9300
transport.bind_host: X.X.2.42
transport.publish_host: X.X.2.42
discovery.seed_hosts: ["esm01", "esm02", "esm03"]
gateway.recover_after_nodes: 5
action.destructive_requires_name: false
transport.tcp.connect_timeout: 120s
In the logs from esm03, all I see is this entry over and over again:
[2020-05-25T17:57:10,637][WARN ][o.e.c.c.ClusterFormationFailureHelper]
[esm03] master not discovered or elected yet, an election requires at least 2
nodes with ids from [BXQ6ct83RDuDqQqJQ-3CIw, L2jT3WjSRqmjInBZm7xgyA,
u5qnw0QZS3WrDFNMKcLQkQ], have discovered [{esm03}
{BXQ6ct83RDuDqQqJQ-3CIw}{fi8SPj2xSpOjkB-dJIHalQ}{X.X.2.42}{X.X.2.42:9300}
{ilm}{ml.machine_memory=8371269632, rack=virtual, xpack.installed=true,
ml.max_open_jobs=20}] which is not a quorum; discovery will continue using
[X.X.1.40:9300, X.X.1.41:9300, 127.0.1.1:9300] from hosts providers and [{esm03}
{BXQ6ct83RDuDqQqJQ-3CIw}{fi8SPj2xSpOjkB-dJIHalQ}{X.X.2.42}{X.X.2.42:9300}
{ilm}{ml.machine_memory=8371269632, rack=virtual, xpack.installed=true,
ml.max_open_jobs=20}] from last-known cluster state; node term 9, last-
accepted version 463803 in term 9
The ID's are corresponding with the existing master nodes:
L2jT3WjSRqmjInBZm7xgyA - esm01
u5qnw0QZS3WrDFNMKcLQkQ - esm02
BXQ6ct83RDuDqQqJQ-3CIw - esm03
I can ping the other ES nodes, and the UFW is open for connections on both port 9200 and 9300. I do however not see attempts on joining the cluster in the logs on the other master nodes esm01 and esm02.
Where do I go from here? Could it be an external firewall blocking traffic?
Sincerely,
Adrian