Elasticsearch cluster across multiple docker hosts

Hello,
I'm trying to create multi node cluster across multiple VMs running docker.
For now I tried setting up master node on my local docker engine. When second node is in the same docker network then there is no issue with connecting them.

elasticsearch.yml of the master node

xpack.security.enabled: true
cluster.name: my-elastic-cluster
cluster.initial_master_nodes: [ es01 ]
node.name: es01
transport.host: 0.0.0.0
xpack.security.http.ssl.key: certs/es01/es01.key
xpack.security.http.ssl.certificate: certs/es01/es01.crt
xpack.security.http.ssl.certificate_authorities: certs/ca/ca.crt
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.key: certs/es01/es01.key
xpack.security.transport.ssl.certificate: certs/es01/es01.crt
xpack.security.transport.ssl.certificate_authorities: certs/ca/ca.crt
xpack.security.transport.ssl.verification_mode: certificate

Elasticsearch.yml of the second node

xpack.security.enabled: true
cluster.name: my-elastic-cluster
cluster.initial_master_nodes: [ es01 ]
discovery.seed_hosts: 
  - es01
node.name: es02
transport.host: es02
node.roles:
  - transform
  - data_frozen
  - remote_cluster_client
  - data
  - ml
  - data_content
  - data_hot
  - data_warm
  - data_cold
  - ingest
xpack.security.http.ssl.key: certs/es02/es02.key
xpack.security.http.ssl.certificate: certs/es02/es02.crt
xpack.security.http.ssl.certificate_authorities: certs/ca/ca.crt
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.key: certs/es02/es02.key
xpack.security.transport.ssl.certificate: certs/es02/es02.crt
xpack.security.transport.ssl.certificate_authorities: certs/ca/ca.crt
xpack.security.transport.ssl.verification_mode: certificate

And the docker-compose

services:
  es01:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.17.10
    ports:
      - "0.0.0.0:9200:9200"
      - "0.0.0.0:9300:9300"
    environment:
      - "http.host=0.0.0.0"
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - ELASTIC_PASSWORD=somepass
    volumes:
      - ./elasticsearch/config/es01/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
      - elastic-data:/usr/share/elasticsearch/data
      - ./certs:/usr/share/elasticsearch/config/certs
    extra_hosts:
      - "host.docker.internal:host-gateway"
      - "es03:host-gateway"
  es02:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.17.10
    expose:
      - 9200
      - 9300
    environment:
      - "http.host=0.0.0.0"
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
      - ELASTIC_PASSWORD=somepass
    volumes:
      - ./elasticsearch/config/es02/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
      - elastic-data:/usr/share/elasticsearch/data
      - ./certs:/usr/share/elasticsearch/config/certs
    extra_hosts:
      - "host.docker.internal:host-gateway"
      - "es03:host-gateway"

For 3rd node

xpack.security.enabled: true
cluster.name: my-elastic-cluster
cluster.initial_master_nodes: [ es01 ]
discovery.seed_hosts:
  - es01:9302
node.name: es03
node.master: false
transport.port: 9300
transport.host: 0.0.0.0
xpack.security.http.ssl.key: certs/es03/es03.key
xpack.security.http.ssl.certificate: certs/es03/es03.crt
xpack.security.http.ssl.certificate_authorities: certs/ca/ca.crt
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.key: certs/es03/es03.key
xpack.security.transport.ssl.certificate: certs/es03/es03.crt
xpack.security.transport.ssl.certificate_authorities: certs/ca/ca.crt
xpack.security.transport.ssl.verification_mode: certificate

and it's docker compose

services:
  es02:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.17.10
    ports:
      - "0.0.0.0:19200:9200"
      - "0.0.0.0:19300:9300"
    environment:
      - "http.host=0.0.0.0"
      - "ES_JAVA_OPTS=-Xms1g -Xmx1g"
      - ELASTIC_PASSWORD=somepass
    volumes:
      - ./elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
      - elastic-data:/usr/share/elasticsearch/data
      - ./certs:/usr/share/elasticsearch/config/certs

    restart: unless-stopped
    extra_hosts:
      - "host.docker.internal:host-gateway"
      - "es01:host-gateway"

I used VM to which I created SSH Tunnel with command ssh my-vm -L 0:19300:127.0.0.1:19300 -R 0:9302:127.0.0.1:9300. To explain the tunnel:

  • -L listens on my local 19300 port and forwards it to VMs 19300 port.
  • -R forwards traffic from VM 9302 port to my local 9300 port which is used by Elasticsearch master node

es01 container has IP 172.23.0.3 (assigned by docker) and the es03 has IP 172.18.0.2. They are on different hosts. es03 is able to connect with es01 (msater node) but then I guess es01 sends it's publish_host address and port and the connection fails
Whole message

[connectToRemoteMasterNode[172.18.0.1:19300]] completed handshake with [{es01}{tigqAqNMRE6L4GNKXZF9lw}{OLUAlRl7TwmL_zwD96FCxA}{172.23.0.3}{172.23.0.3:9300}{cdfhilmrstw}{ml.machine_memory=33359724544, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=2147483648, transform.node=true}] but followup connection failed

The 172.18.0.1 is gateway address so basically address of host.

Can I just set the correct IP:port of the master node that es03 can use to communicate with it? Or maybe I have to use some iptables?

Hi Janusz and welcome!

I'd suggest looking at these docs, particularly the section on binding and publishing to understand better how Elasticsearch needs its network to be configured. Basically it looks like your nodes are not accessible at their transport publish addresses. It can be tricky to get everything lined up correctly in Docker since it all depends on exactly how your environment is configured.

In particular the followup connection failed message indicates that node es01 is accessible at 172.18.0.1:19300, but it thinks its address is 172.23.0.3:9300 and it is not accessible at this address. You either need to make it accessible at that address, or else adjust its publish address to one at which it is accessible.

Thank you for your answer. So if I understand correctly master node has to be accessible to every node through it's publish_host. If 1 node is behind some port forwarding or NAT then it will fail to connect to master node.

Let me confirm 1 more thing. Do I correctly understand that even though we can set multiple addresses to transport.publish_host only 1 will be chosen by master node and every node should be able to connect to master through this address, yes?

Not just the master node. All nodes have to be accessible to all other nodes at their transport publish addresses.

It's not the master node which chooses it, but yes Elasticsearch will choose one address to be its publish address. See the note in these docs, particularly

Ensure each node is accessible at all possible publish addresses

It's almost certainly a terrible idea to specify more than one publish address. At best it'll be very confusing, but normally it leads to unrepeatable and very hard-to-diagnose issues.

Thank you for reply