Hello,
I've got a 3 node ES cluster running alongside Kibana as part of a docker-compose file. My compose file is below:
version: '2.2'
services:
es01:
image: docker.elastic.co/elasticsearch/elasticsearch:7.5.0
container_name: es01
environment:
- node.name=es01
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es02,es03
- cluster.initial_master_nodes=es01,es02,es03
- "ES_JAVA_OPTS=-Xms7g -Xmx7g"
volumes:
- data01:/usr/share/elasticsearch/data
ports:
- "127.0.0.1:9200:9200"
networks:
- esnet
es02:
image: docker.elastic.co/elasticsearch/elasticsearch:7.5.0
container_name: es02
environment:
- node.name=es02
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es03
- cluster.initial_master_nodes=es01,es02,es03
- "ES_JAVA_OPTS=-Xms7g -Xmx7g"
volumes:
- data02:/usr/share/elasticsearch/data
networks:
- esnet
es03:
image: docker.elastic.co/elasticsearch/elasticsearch:7.5.0
container_name: es03
environment:
- node.name=es03
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es02
- cluster.initial_master_nodes=es01,es02,es03
- "ES_JAVA_OPTS=-Xms7g -Xmx7g"
volumes:
- data03:/usr/share/elasticsearch/data
networks:
- esnet
kibana:
image: docker.elastic.co/kibana/kibana:7.5.0
ports:
- "5601:5601"
depends_on:
- es03
environment:
- ELASTICSEARCH_HOSTS=http://es01:9200
networks:
- esnet
volumes:
data01:
driver: local
data02:
driver: local
data03:
driver: local
networks:
esnet:
This setup was running perfectly well for a couple of months until I had to make some changes (adding ulimit settings to prevent swapping) to docker-compose.yml. Before making the changes, I brought down the containers using docker-compose down. I then added the necessary lines and brought the containers back up.
Since bringing them back up, I've been getting errors from each of the Elasticsearch nodes complaining about an inability to elect a master. The output is as follows:
es01 | {"type": "server", "timestamp": "2020-07-16T13:50:01,439Z", "level": "WARN", "component": "o.e.c.c.ClusterFor
mationFailureHelper", "cluster.name": "es-docker-cluster", "node.name": "es01", "message": "master not discovered or elec
ted yet, an election requires at least 2 nodes with ids from [FpSZvU6nRl-2KYRyZJ1i2Q, qrujmnYwS5G9HdZ4SIlweA, XdKCB9MvSr-
8FZMWHpzScg], have discovered [{es01}{qrujmnYwS5G9HdZ4SIlweA}{o2wyby9DSjyBHA-bwtcalg}{192.168.208.2}{192.168.208.2:9300}{
dilm}{ml.machine_memory=33687420928, xpack.installed=true, ml.max_open_jobs=20}] which is not a quorum; discovery will co
ntinue using [192.168.208.3:9300, 192.168.208.4:9300] from hosts providers and [{es01}{qrujmnYwS5G9HdZ4SIlweA}{o2wyby9DSj
yBHA-bwtcalg}{192.168.208.2}{192.168.208.2:9300}{dilm}{ml.machine_memory=33687420928, xpack.installed=true, ml.max_open_j
obs=20}] from last-known cluster state; node term 49, last-accepted version 2490 in term 49" }
es02 | {"type": "server", "timestamp": "2020-07-16T13:50:04,900Z", "level": "WARN", "component": "o.e.c.c.ClusterFor
mationFailureHelper", "cluster.name": "es-docker-cluster", "node.name": "es02", "message": "master not discovered or elec
ted yet, an election requires at least 2 nodes with ids from [FpSZvU6nRl-2KYRyZJ1i2Q, qrujmnYwS5G9HdZ4SIlweA, XdKCB9MvSr-
8FZMWHpzScg], have discovered [{es02}{XdKCB9MvSr-8FZMWHpzScg}{-t78nWdqQHOanziElOv1fg}{192.168.208.3}{192.168.208.3:9300}{
dilm}{ml.machine_memory=33687420928, xpack.installed=true, ml.max_open_jobs=20}] which is not a quorum; discovery will co
ntinue using [192.168.208.2:9300, 192.168.208.4:9300] from hosts providers and [{es02}{XdKCB9MvSr-8FZMWHpzScg}{-t78nWdqQH
OanziElOv1fg}{192.168.208.3}{192.168.208.3:9300}{dilm}{ml.machine_memory=33687420928, xpack.installed=true, ml.max_open_j
obs=20}] from last-known cluster state; node term 49, last-accepted version 2491 in term 49" }
es03 | {"type": "server", "timestamp": "2020-07-16T13:50:06,281Z", "level": "WARN", "component": "o.e.c.c.ClusterFor
mationFailureHelper", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "master not discovered or elec
ted yet, an election requires at least 2 nodes with ids from [FpSZvU6nRl-2KYRyZJ1i2Q, qrujmnYwS5G9HdZ4SIlweA, XdKCB9MvSr-
8FZMWHpzScg], have discovered [{es03}{FpSZvU6nRl-2KYRyZJ1i2Q}{8TvEjBy2SMuNagcP0avCOw}{192.168.208.4}{192.168.208.4:9300}{
dilm}{ml.machine_memory=33687420928, xpack.installed=true, ml.max_open_jobs=20}] which is not a quorum; discovery will co
ntinue using [192.168.208.2:9300, 192.168.208.3:9300] from hosts providers and [{es03}{FpSZvU6nRl-2KYRyZJ1i2Q}{8TvEjBy2SM
uNagcP0avCOw}{192.168.208.4}{192.168.208.4:9300}{dilm}{ml.machine_memory=33687420928, xpack.installed=true, ml.max_open_j
obs=20}] from last-known cluster state; node term 49, last-accepted version 2491 in term 49" }
I reverted the compose file to the previously working version, seen above. After bringing the containers back up, I'm still getting the same error. I suspect that part of the problem is to do with persisting the volumes for each of the containers. Is it possible that the cluster is retaining a configuration from before, but now that the containers have been rebuilt some of that information has changed? I need to retain the data from the ES cluster and Kibana.
To troubleshoot, I've tried spawning a shell into each of the ES nodes and curling each of the other nodes in the cluster. This doesn't seem to work when using the node names specified in the docker-compose file with ports associated with ES (9200/9300). I can, however, ping the IP for each of the nodes and get a response, so it seems like the nodes can actually communicate.
Happy to provide more information if necessary.