I'm pulling my hair out trying to figure out why my three nodes, running as docker-compose containers on 3 separate Ubuntu 22.04LTS hosts, cannot start.
The 3 nodes discover each other but fail to elect a master. When I curl to get the list of nodes, I get a "master not discovered exception".
Here are the repeating log statements on the 3 nodes:
node01-master | {"type": "server", "timestamp": "2023-08-04T15:54:14,467Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "es-cluster-test-73", "node.name": "node01-master", "message": "master not discovered or elected yet, an election requires at least 2 nodes with ids from [XSnNy2f2QOmu9ZrAbEqeGA, TWCTna4GRzSk4NQmYLe8aw, CrOUreENRAKBvUHmMSqIlA], have only discovered non-quorum [{node01-master}{CrOUreENRAKBvUHmMSqIlA}{ODyLuSnYSNusYwdWUa-tFg}{172.16.0.152}{172.16.0.152:9307}{cdfhimrstw}, {node02-data}{PjWpoi54RIuHbTbNN2eqzQ}{NnQM-vFpQd6nC-BQa5QFGg}{172.16.0.149}{172.16.0.149:9307}{cdfhimrstw}, {node03-data}{gcWZXovdQGST_aVd1sedhQ}{YYN3f3gbTcS0_mDrfoasGg}{172.16.0.148}{172.16.0.148:9307}{cdfhimrstw}]; discovery will continue using [172.16.0.149:9307, 172.16.0.148:9307] from hosts providers and [{node01-master}{CrOUreENRAKBvUHmMSqIlA}{ODyLuSnYSNusYwdWUa-tFg}{172.16.0.152}{172.16.0.152:9307}{cdfhimrstw}] from last-known cluster state; node term 2, last-accepted version 101 in term 2" }
As you can see, one of the two expected IDs has been discovered.
node02-data | {"type": "server", "timestamp": "2023-08-04T15:54:25,534Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "es-cluster-test-73", "node.name": "node02-data", "message": "master not discovered or elected yet, an election requires two nodes with ids [PjWpoi54RIuHbTbNN2eqzQ, CrOUreENRAKBvUHmMSqIlA], have discovered possible quorum [{node02-data}{PjWpoi54RIuHbTbNN2eqzQ}{NnQM-vFpQd6nC-BQa5QFGg}{172.16.0.149}{172.16.0.149:9307}{cdfhimrstw}, {node01-master}{CrOUreENRAKBvUHmMSqIlA}{ODyLuSnYSNusYwdWUa-tFg}{172.16.0.152}{172.16.0.152:9307}{cdfhimrstw}, {node03-data}{gcWZXovdQGST_aVd1sedhQ}{YYN3f3gbTcS0_mDrfoasGg}{172.16.0.148}{172.16.0.148:9307}{cdfhimrstw}]; discovery will continue using [172.16.0.152:9307, 172.16.0.148:9307] from hosts providers and [{node02-data}{PjWpoi54RIuHbTbNN2eqzQ}{NnQM-vFpQd6nC-BQa5QFGg}{172.16.0.149}{172.16.0.149:9307}{cdfhimrstw}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
As you can see, both of the two expected IDs have been discovered.
node03-data | {"type": "server", "timestamp": "2023-08-04T15:54:35,075Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "es-cluster-test-73", "node.name": "node03-data", "message": "master not discovered or elected yet, an election requires 2 nodes with ids [gcWZXovdQGST_aVd1sedhQ, CrOUreENRAKBvUHmMSqIlA], have discovered possible quorum [{node03-data}{gcWZXovdQGST_aVd1sedhQ}{YYN3f3gbTcS0_mDrfoasGg}{172.16.0.148}{172.16.0.148:9307}{cdfhimrstw}, {node01-master}{CrOUreENRAKBvUHmMSqIlA}{ODyLuSnYSNusYwdWUa-tFg}{172.16.0.152}{172.16.0.152:9307}{cdfhimrstw}, {node02-data}{PjWpoi54RIuHbTbNN2eqzQ}{NnQM-vFpQd6nC-BQa5QFGg}{172.16.0.149}{172.16.0.149:9307}{cdfhimrstw}]; discovery will continue using [172.16.0.152:9307, 172.16.0.149:9307] from hosts providers and [{node03-data}{gcWZXovdQGST_aVd1sedhQ}{YYN3f3gbTcS0_mDrfoasGg}{172.16.0.148}{172.16.0.148:9307}{cdfhimrstw}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
As you can see, again, both of the two expected IDs have been discovered.
I don't know how important the last line is in each log line, about the "last-accepted version" cluster state and the "term". The 1st node keeps on giving the same values for those. The other two are at 0 for both.
Here is the docker-compose.yaml for the 1st node (the others are of course very similar). The IP addresses you see belong to the docker hosts.
version: '3.7'
services:
node01-master:
build: .
container_name: node01-master
hostname: es-master
environment:
- node.name=node01-master
- cluster.name=es-cluster-test-73
- discovery.seed_hosts=172.16.0.152:9307,172.16.0.149:9307,172.16.0.148:9307
- cluster.initial_master_nodes=node01-master,node02-data,node03-data
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms256M -Xmx256M"
- node.master=true
- node.voting_only=false
- node.data=true
- node.ingest=true
- node.ml=false
- xpack.ml.enabled=true
- cluster.remote.connect=true
- network.publish_host=172.16.0.152
- transport.publish_port=9307
- http.publish_port=9307
volumes:
- data01:/usr/share/elasticsearch/data
ulimits:
memlock:
soft: -1
hard: -1
ports:
- "9207:9200"
- "9307:9300"
networks:
- elastic_73
restart: always
volumes:
data01:
driver: local
networks:
elastic_73:
driver: bridge
ipam:
driver: default
config:
- subnet: 10.40.4.1/24
I'd be grateful for any help!
- George