Can not create elasticsearch cluster in different servers with docker compose

I have 2 servers, and create elasticsearch nodes in the 2 servers. the content of docker-compose.yml files are like these:

es0:
    image: elasticsearch:7.6.0
    container_name: es0
    environment:
      - "ES_JAVA_OPTS=-Xms1024m -Xmx1024m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    ports:
      - 9200:9200
      - 9300:9300
    volumes:
      - "/mnt/docker/es0/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml"
      - "/mnt/docker/es0/data:/usr/share/elasticsearch/data"
      - "/mnt/docker/es0/plugins:/usr/share/elasticsearch/plugins"
      - "/mnt/docker/es0/config/cert:/usr/share/elasticsearch/config/cert"
  es1:
    image: elasticsearch:7.6.0
    container_name: es1
    environment:
      - "ES_JAVA_OPTS=-Xms1024m -Xmx1024m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    ports:
      - 9200:9200
      - 9300:9300
    volumes:
      - "/mnt/docker/es1/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml"
      - "/mnt/docker/es1/data:/usr/share/elasticsearch/data"
      - "/mnt/docker/es1/plugins:/usr/share/elasticsearch/plugins"
      - "/mnt/docker/es1/config/cert:/usr/share/elasticsearch/config/cert"

and I configured the elasticsearch.yml like these:

cluster.name: hs-cluster
node.name: es-00
node.master: true
node.data: true
http.host: 0.0.0.0
http.port: 9200
transport.host: 0.0.0.0
transport.tcp.port: 9300
#network.host: 0.0.0.0 
network.bind_host: ["192.168.0.2", "101.xx.xx.136"]
network.publish_host: 192.168.0.2

gateway.recover_after_nodes: 1

http.cors.enabled: true 
http.cors.allow-origin: "*"

cluster.initial_master_nodes: ["es-00", "es-01"] 
discovery.seed_hosts: [ "192.168.0.2:9300", "192.168.0.3:9300" ]

bootstrap.memory_lock: true
bootstrap.system_call_filter: false
cluster.name: hs-cluster
node.name: es-01
node.master: true
node.data: true
http.host: 0.0.0.0
http.port: 9200
transport.host: 0.0.0.0
transport.tcp.port: 9300
#network.host: 0.0.0.0 
network.bind_host: ["192.168.0.3", "101.xx.xx.137"]
network.publish_host: 192.168.0.3

gateway.recover_after_nodes: 1

http.cors.enabled: true 
http.cors.allow-origin: "*"

cluster.initial_master_nodes: ["es-00", "es-01"] 
discovery.seed_hosts: [ "192.168.0.2:9300", "192.168.0.3:9300" ]

bootstrap.memory_lock: true
bootstrap.system_call_filter: false

when I run the instances, they all started successfully. But when I call _cluster/state?pretty, they all gave the error message:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "master_not_discovered_exception",
        "reason" : null
      }
    ],
    "type" : "master_not_discovered_exception",
    "reason" : null
  },
  "status" : 503
}

that means they can't find each other. I also tried to set network.host: 0.0.0.0 but the result was the same. Who know the reason of this master not discovered exception? How to resolve it?

btw, I can ran the cluster in the same server with docker compose. But in different servers, it is failed. I also ran telnet xxx 9300 in each server, they all connected.

And I add networks to docker-compose.yml,
and change network configuration in elasticsearch.yml:
network.host: 0.0.0.0
network.publish_host: 192.168.0.2
and restart instances again. Then I got the error message like this:

master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [es-00, es-01] to bootstrap a cluster:
have discovered [{es-00}{0laFYaAxRr22MfbxvFZSlw}{tEDB7BzSQ2am1L30XTTcBQ}{172.20.0.2}{172.20.0.2:9300}{dilm}{ml.machine_memory=8201236480, xpack.installed=true, ml.max_open_jobs=20}];
discovery will continue using [192.168.0.2:9300, 192.168.0.3:9300] from hosts providers
and [{es-00}{0laFYaAxRr22MfbxvFZSlw}{tEDB7BzSQ2am1L30XTTcBQ}{172.20.0.2}{172.20.0.2:9300}{dilm}{ml.machine_memory=8201236480, xpack.installed=true, ml.max_open_jobs=20}]
from last-known cluster state; node term 0, last-accepted version 0 in term 0

Your nodes have addresses like 172.20.0.2 but you are trying to discover them at addresses like 192.168.0.2? That seems unlikely to work.

You almost certainly don't want to set network.publish_host, but maybe you want network.host: 192.168.0.2 or network.host: _eth0_ or whatever the interface is called?

1 Like

I see. It's the network problem.
I created the es cluster with network-mode: host in docker-compose. It is said host mode is not good in production server.
I will try to use k8s to create es cluster. Thank you very much!