Run cluster in docker, on 2 hosts

Hi.
I have 2 hosts and 2 docker-compose file.

version: '3.6'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.5.1
    environment:
      - node.name=predprod-rumcelk1 / predprod-rumcelk0 for 2 hosts
      - cluster.name=good-cluster
      - "network.host=0.0.0.0"
      - discovery.zen.ping.unicast.hosts=["ip_one_host", "ip_two_host"]
      - "cluster.initial_master_nodes=predprod-rumcelk1,predprod-rumcelk0"
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
      - "path.repo=/opt/elasticsearch/snapshots"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    restart: "unless-stopped"
    ports:
      - 9200:9200
      - 9300:9300
    networks:
        elastic:
              ipv4_address: 10.5.0.6
networks:
  elastic:
        driver: bridge
        ipam:
          config:
            - subnet: 10.5.0.0/16

And i getting it in logs

master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [predprod-rumcelk0, predprod-rumcelk1] to bootstrap a cluster: have discovered [{predprod-rumcelk0}{T4py-ijYRuWsP39mdIO6-Q}{JGzZOYMoQSm_8Pigbd_s1A}{10.5.0.6}{10.5.0.6:9300}{dilm}{ml.machine_memory=3857117184, xpack.installed=true, ml.max_open_jobs=20}]; discovery will continue using [ip_two_host:9300, ip_one_host:9300] from hosts providers and [{predprod-rumcelk0}{T4py-ijYRuWsP39mdIO6-Q}{JGzZOYMoQSm_8Pigbd_s1A}{10.5.0.6}{10.5.0.6:9300}{dilm}{ml.machine_memory=3857117184, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0

From the documentation: This message shows the node names master-a.example.com and master-b.example.com as well as the cluster.initial_master_nodes entries master-a and master-b , and it is clear from this message that they do not match exactly.

How to specify the names for the docker container?

this node must discover master-eligible nodes [predprod-rumcelk0, predprod-rumcelk1] to bootstrap a cluster: have discovered [{predprod-rumcelk0}{T4py-ijYRuWsP39mdIO6-Q}{JGzZOYMoQSm_8Pigbd_s1A}{10.5.0.6}{10.5.0.6:9300}{dilm}{ml.machine_memory=3857117184, xpack.installed=true, ml.max_open_jobs=20}]

This node has discovered itself (!) but has not discovered the other node, so they cannot form a cluster. It is trying these addresses:

discovery will continue using [ip_two_host:9300, ip_one_host:9300]

Normally this is either that the other node isn't available at the expected address, or it's using the wrong internal address (does 10.5.0.6 look right to you?) or there's a firewall in the way which is blocking traffic. A good basic connectivity check from one host to another by logging into a host and doing curl http://other_host:9300/. If this returns This is not an HTTP port then connectivity is good; if you get anything else then you need to work on your network first.

Yes, connectivity is good.
ip_two_host:9300 = host ip -> 10.5.0.6 = docker conteiner ip ( in docker network ).

if you get curl inside the container:

root@predprod-rumcelk1:/srv/docker-compose# docker exec -it docker-compose_elasticsearch_1 bash
[root@b04756549ac5 elasticsearch]# curl http://ip_one_host:9200/
{
"name" : "predprod-rumcelk0",
"cluster_name" : "good-cluster",
"cluster_uuid" : "na",
"version" : {
"number" : "7.5.1",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "3ae9ac9a93c95bd0cdc054951cf95d88e1e18d96",
"build_date" : "2019-12-16T22:57:37.835892Z",
"build_snapshot" : false,
"lucene_version" : "8.3.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
[root@b04756549ac5 elasticsearch]# curl http://ip_two_host:9200/
{
"name" : "predprod-rumcelk1",
"cluster_name" : "good-cluster",
"cluster_uuid" : "na",
"version" : {
"number" : "7.5.1",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "3ae9ac9a93c95bd0cdc054951cf95d88e1e18d96",
"build_date" : "2019-12-16T22:57:37.835892Z",
"build_snapshot" : false,
"lucene_version" : "8.3.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}

That's not quite what I said. You need to check connectivity for port 9300, not 9200, and you need to see the response This is not an HTTP port.

Yes, it is wokr..
Did not apply
root@predprod-rumcelk1:~# curl http://10.10.119.171:9300/
This is not an HTTP port
root@predprod-rumcelk0:~# curl http://10.10.119.172:9300/
This is not an HTTP port

These addresses are different from the ones the nodes are using to talk to each other. The nodes are trying to communicate on addresses like 10.5.0.6, not 10.10.119.171.

10.10.119.171 this is the host address
What to do to use it

I'm not a Docker networking expert, but it looks like a bridge network is the wrong choice for your setup. From these docs:

Bridge networks apply to containers running on the same Docker daemon host. For communication among containers running on different Docker daemon hosts, you can either manage routing at the OS level, or you can use an overlay network.

FWIW I set up a cluster in this state to see how easy it is to diagnose and discovered that we don't really log anything helpful if your config is like this, so I opened https://github.com/elastic/elasticsearch/pull/51304.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.