Cannot setup cluster of ES with Docker of 2 EC2 machines on 3 nodes

vitaly1233 · October 16, 2021, 12:41pm

Im trying to setup 3 nodes on 2 machines as the following configuration.

following the doc - Install Elasticsearch with Docker | Elasticsearch Guide [7.15] | Elastic
(run this script on each machine)

services:
  es01:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.10.1
    container_name: es01
    environment:
      - node.name=es01
      - network.publish_host=10.2.0.38
      - cluster.name=my-cluster
      - discovery.seed_hosts=es02,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - network.host=0.0.0.0
      - network.bind_host=0.0.0.0
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - vit01:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    networks:
      - elastic
  es02:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.10.1
    container_name: es02
    environment:
      - node.name=es02
      - network.publish_host=10.2.0.38
      - cluster.name=my-cluster
      - discovery.seed_hosts=es01,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - network.host=0.0.0.0
      - network.bind_host=0.0.0.0
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - vit02:/usr/share/elasticsearch/data
    networks:
      - elastic
  es03:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.10.1
    container_name: es03
    environment:
      - node.name=es03
      - network.publish_host=10.2.0.243
      - cluster.name=my-cluster
      - discovery.seed_hosts=es01,es02
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - network.host=0.0.0.0
      - network.bind_host=0.0.0.0
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - vit03:/usr/share/elasticsearch/data
    networks:
      - elastic

volumes:
  vit01:
    external: true
  vit02:
    external: true
  vit03:
    external: true

networks:
  elastic:
    driver: bridge

The error I got on both machines

es01    | {"type": "server", "timestamp": "2021-10-16T12:35:38,508Z", "level": "WARN", "component": "o.e.d.HandshakingTransportAddressConnector", "cluster.name": "bigid-elasticsearch-cluster", "node.name": "es01", "message": "[connectToRemoteMasterNode[172.24.0.2:9300]] completed handshake with [{es02}{1LkU4MjdQwa2j-hw9IaVlA}{Y6V0srCoTsqehcNbdy8F_g}{10.2.0.38}{10.2.0.38:9300}{cdhilmrstw}{ml.machine_memory=66548318208, ml.max_open_jobs=20, xpack.installed=true, transform.node=true}] but followup connection failed", 
es01    | "stacktrace": ["org.elasticsearch.transport.ConnectTransportException: [es02][10.2.0.38:9300] connect_exception",
es01    | "at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:978) ~[elasticsearch-7.10.1.jar:7.10.1]",
es01    | "at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:198) ~[elasticsearch-7.10.1.jar:7.10.1]",
es01    | "at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:42) ~[elasticsearch-core-7.10.1.jar:7.10.1]",
es01    | "at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]",
es01    | "at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]",
es01    | "at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]",
es01    | "at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2152) ~[?:?]",
es01    | "at org.elasticsearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:57) ~[elasticsearch-core-7.10.1.jar:7.10.1]",
es01    | "at org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:68) ~[?:?]",
es01    | "at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) ~[?:?]",
es01    | "at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570) ~[?:?]",
es01    | "at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549) ~[?:?]",
es01    | "at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) ~[?:?]",
es01    | "at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) ~[?:?]",
es01    | "at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) ~[?:?]",
es01    | "at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[?:?]",
es01    | "at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321) ~[?:?]",
es01    | "at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337) ~[?:?]",
es01    | "at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702) ~[?:?]",
es01    | "at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) ~[?:?]",
es01    | "at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) ~[?:?]",
es01    | "at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) ~[?:?]",
es01    | "at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[?:?]",
es01    | "at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]",
es01    | "at java.lang.Thread.run(Thread.java:832) [?:?]",
es01    | "Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: 10.2.0.38/10.2.0.38:9300",

TCP port 9300 is opened on both machines I can enter from by:
telnet 10.2.0.243 9300 and vice versa.

What I'm missing ?

leandrojmp · October 16, 2021, 2:25pm

The documentation you linked is for when you run the 3 nodes in the same host with docker compose, which seems different from what you are trying to do.

Your architecture is a little confusing, can you explain how it works? Where are you running the docker-compose? What are the IP address of the instances?

Also, this config is duplicated network.publish_host=10.2.0.38, you can't have two different nodes listening using the same IP address and the same port, and your es02 container does not have any port being exposed.

vitaly1233 · October 16, 2021, 2:45pm

Thanks for reply. So basically I have 2 ec2 machines ip 10.2.0.243 and 10.2.0.38. I want using docker compose setup 3 nodes on them. 2 nodes on 1 machine and the 3rd on second. Following the doc I creates script as in my question and run in on 2 machines. Maybe I need to split the script between the machines ? (If I expose same port on multiple nodes will get port already exists)

leandrojmp · October 16, 2021, 2:59pm

The doc assumes that everything is going to run in one machine, your architecture is completely different.

If you are running the same docker-compose in both machines you are starting 6 nodes, 3 on each ec2 instance.

I do not know much about docker, but I don't think that using the IP address of your host as the publish address of your container will work like that as the containers will run on a different network created by docker.

From the docker documentation you have this:

Bridge networks apply to containers running on the same Docker daemon host. For communication among containers running on different Docker daemon hosts, you can either manage routing at the OS level, or you can use an overlay network.

Which basically means that your containers will only be able to connect witch containers running on the same ec2 instance.

Your issue is more related to your network architecture, you just need to make sure that your containers can talk with each other on the publish address.

Maybe the discussion in this post can help a little.

vitaly1233 · October 16, 2021, 6:29pm

So except the bridge networking issue that I'll change, Do I need to split the script so on each machine will have the relevant part of nodes configuration. E.g machine 10.2.0.38 run partially script that relevant only for ec1,ec2 part, and so on.

system · November 13, 2021, 6:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Dockerised elasticsearch cluster in different hosts Elasticsearch docker	7	3534	March 21, 2019
User docker to build elasticsearch cluster on multi-machine Elasticsearch docker	10	1780	September 10, 2020
ES 5.6.16 Docker cluster on multiple physical machines via Docker on custom ports Elasticsearch	1	726	August 20, 2019
Trying to setup elasticsearch cluster with docker-compose Elasticsearch	10	17463	December 15, 2017
Elasticsearch clustering on two hosts using docker-compose Elasticsearch	3	1045	August 6, 2019

Cannot setup cluster of ES with Docker of 2 EC2 machines on 3 nodes

Related topics