Elasticsearch Cluster on Docker Stack Fails

I try to create a 3-node Elasticsearch cluster on docker stack.
I started with the docker-stack.yml file from this repository https://github.com/deviantony/docker-elk#how-to-scale-out-the-elasticsearch-cluster

And extended it into this one

version: '3.3'

services:

  es01:

    image: docker.elastic.co/elasticsearch/elasticsearch:7.6.0

    container_name: es01

    environment:

      - network.host=0.0.0.0

      - node.name=es01

      - cluster.name=es-docker-cluster

      - discovery.seed_hosts=es02,es03

      - cluster.initial_master_nodes=es01,es02,es03

      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"

    volumes:

      - data01:/usr/share/elasticsearch/data

    ports:

      - 9200:9200

    networks:

      - elk

  es02:

    image: docker.elastic.co/elasticsearch/elasticsearch:7.6.0

    container_name: es02

    environment:

      - network.host=0.0.0.0

      - node.name=es02

      - cluster.name=es-docker-cluster

      - discovery.seed_hosts=es01,es03

      - cluster.initial_master_nodes=es01,es02,es03

      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"

    volumes:

      - data02:/usr/share/elasticsearch/data

    networks:

      - elk

  es03:

    image: docker.elastic.co/elasticsearch/elasticsearch:7.6.0

    container_name: es03

    environment:

      - network.host=0.0.0.0

      - node.name=es03

      - cluster.name=es-docker-cluster

      - discovery.seed_hosts=es01,es02

      - cluster.initial_master_nodes=es01,es02,es03

      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"

    volumes:

      - data03:/usr/share/elasticsearch/data

    networks:

      - elk

  logstash:

    image: docker.elastic.co/logstash/logstash:7.6.0

    ports:

      - "5000:5000"

      - "9600:9600"

    configs:

      - source: logstash_config

        target: /usr/share/logstash/config/logstash.yml

      - source: logstash_pipeline

        target: /usr/share/logstash/pipeline/logstash.conf

    environment:

      LS_JAVA_OPTS: "-Xmx256m -Xms256m"

    networks:

      - elk

    deploy:

      mode: replicated

      replicas: 1

  kibana:

    image: docker.elastic.co/kibana/kibana:7.6.0

    ports:

      - "5601:5601"

    configs:

      - source: kibana_config

        target: /usr/share/kibana/config/kibana.yml

    networks:

      - elk

    deploy:

      mode: replicated

      replicas: 1

configs:

  elastic_config:

    file: ./elasticsearch/config/elasticsearch.yml

  logstash_config:

    file: ./logstash/config/logstash.yml

  logstash_pipeline:

    file: ./logstash/pipeline/logstash.conf

  kibana_config:

    file: ./kibana/config/kibana.yml

volumes:

  data01:

    driver: local

  data02:

    driver: local

  data03:

    driver: local

networks:

  elk:

    driver: overlay

My kibana.yml config file looks like this

server.name: kibana

server.host: "0"

elasticsearch.hosts: [ "http://es01:9200" ]

xpack.monitoring.ui.container.elasticsearch.enabled: true

## X-Pack security credentials

#

elasticsearch.username: elastic

elasticsearch.password: changeme

When I try to deploy the docker stack file locally (docker stack deploy -c ./docker-stack.yml elk) And read the logs of one of the elasticsearch nodes, I get these errors

https://pastebin.com/C7jScFkJ

I don't understand what's the problem.
I tried to expose ports 9300 as well, but it didn't work.

Needless to say, Kibana doesn't load, it displays this message

Kibana server is not ready yet

I was already able to lunch ELK with one elasticsearch node and it worked. The problem arises when I try to have multiple nodes.

What am I doing wrong here?

The pertinent bits of logs are here:

"node.name": "es01", "message": "failed to join {es02}{55fLj3lWTTmWCqWUAk3vHg}{FTOdPOtYRROvu_08j1NCJQ}{10.0.7.10}{10.0.7.10:9300}...
"stacktrace": ["org.elasticsearch.transport.RemoteTransportException: [es02][10.0.7.10:9300][internal:cluster/coordination/join]",
"Caused by: org.elasticsearch.transport.ConnectTransportException: [es01][10.0.0.87:9300] connect_exception",
...
"Caused by: java.io.IOException: Connection refused: 10.0.0.87/10.0.0.87:9300",

In English, this node (es01) failed to join another node (es02); es01 managed to connect to es02 at 10.0.7.10:9300 but es02 tried to connect back to es01 at 10.0.0.87:9300 and was refused. You need to ensure that all the nodes can connect to each other at these addresses (or change these addresses).

Thank you for replying.
I see that when I change the network driver to bridge, it works.
But in QA/Prod we're using overlay.
Do you have any idea how to get it to work with overlay, or why wouldn't it work that way?

It very much looks like a network config issue to me, although the details will depend on how your network is configured.

You need to make sure that every node can connect to every other node at its publish address, which is logged by the node when it starts up. In this case es01's publish address is 10.0.0.87:9300, but this is apparently not accessible to es02. The two possible fixes are either to adjust your network so es02 can talk to es01 at this address, or else change es01's publish address to one that is accessible to es02 (e.g. set network.host to bind to a specific interface).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.