Elasticsearch cluster setup on 3 nodes

I am trying to setup the elasticsearch cluster on 3 nodes with AWS ELB.

    docker run \
    -p 9200:9200 \
    -p 9300:9300 \
    --name es-master1 \
    -v es-master1:/usr/share/elasticsearch/data \
    -v es-master1_log:/var/log/elasticsearch \
    -e "ES_JAVA_OPTS=-Xms4g -Xmx4g" \
    -e "node.master=true" \
    -e "node.ingest=true" \
    -e "node.data=true" \
    -e "node.name=ip-10-6-175-54.cloud.dev.net" \
    -e "cluster.name=main-cluster" \
    -e "network.host=0.0.0.0" \
    -e "cluster.initial_master_nodes=ip-10-6-175-54.cloud.dev.net,ip-10-6-175-188.cloud.dev.net,ip-10-6-175-164.cloud.dev.net" \
    -e "discovery.seed_hosts=10.6.175.188:9300,10.6.175.164:9300" \
    -e "discovery.zen.minimum_master_nodes=2" \
    -e "xpack.security.enabled=false" \
    -e "xpack.monitoring.enabled=true" \
    artifactory.global.xyz/elasticsearch/elasticsearch:7.4.2

But I am getting the below warning message and cluster is not getting formed.

{"type": "server", "timestamp": "2019-11-12T12:42:30,807Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "main-cluster", "node.name": "ip-10-6-175-54.cloud.dev.net", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [ip-10-6-175-54.cloud.dev.net, ip-10-6-175-188.cloud.dev.net, ip-10-6-175-164.cloud.dev.net] to bootstrap a cluster: have discovered [{ip-10-6-175-54.cloud.dev.net}{9AX2HGjcRkSGbSyfpQxmEQ}{RXc3q5aiSDS18Z5jlXFeGw}{172.17.0.3}{172.17.0.3:9300}{dilm}{ml.machine_memory=7937126400, xpack.installed=true, ml.max_open_jobs=20}]; discovery will continue using [10.6.175.188:9300, 10.6.175.164:9300] from hosts providers and [{ip-10-6-175-54.cloud.dev.net}{9AX2HGjcRkSGbSyfpQxmEQ}{RXc3q5aiSDS18Z5jlXFeGw}{172.17.0.3}{172.17.0.3:9300}{dilm}{ml.machine_memory=7937126400, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }

I am able to do the telnet for other two instances telnet ip 9200
is there any configuration missing. can someone please help me to understand issue here.

Hi Team,
I have additional environment variables for TRACE logging

    docker run \
    -p 9200:9200 \
    -p 9300:9300 \
    -h es-master1 \
    --name es-master1 \
    -v es-master1:/usr/share/elasticsearch/data \
    -e "ES_JAVA_OPTS=-Xms4g -Xmx4g" \
    -e "node.master=true" \
    -e "node.ingest=true" \
    -e "node.data=true" \
    -e "node.name=10.6.175.170" \
    -e "cluster.name=main-cluster" \
    -e "network.host=0.0.0.0" \
    -e "cluster.initial_master_nodes=10.6.175.170,10.6.175.184,10.6.175.52" \
    -e "discovery.seed_hosts=10.6.175.170:9300,10.6.175.184:9300,10.6.175.52:9300" \
    -e "discovery.zen.minimum_master_nodes=2" \
    -e "xpack.security.enabled=false" \
    -e "xpack.monitoring.enabled=true" \
    -e "logger.org.elasticsearch.cluster.coordination.ClusterBootstrapService=TRACE" \
    -e "logger.org.elasticsearch.discovery=TRACE" \
    artifactory.global.xyz/elasticsearch/elasticsearch:7.4.2

and got the below output. part1

and got the below output. I am running elasticsearch as docker container not sure how to take full log. here is the log fragment 

    {"type": "server", "timestamp": "2019-11-13T13:41:39,404Z", "level": "TRACE", "component": "o.e.d.HandshakingTransportAddressConnector", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "[connectToRemoteMasterNode[10.6.175.170:9300]] opened probe connection" }
        {"type": "server", "timestamp": "2019-11-13T13:41:39,408Z", "level": "TRACE", "component": "o.e.d.HandshakingTransportAddressConnector", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "[connectToRemoteMasterNode[10.6.175.170:9300]] handshake successful: {10.6.175.170}{VddPVHVCQn6lYfFLvkpIPQ}{UXGG7RujTD68yX3ByBJlCQ}{172.17.0.2}{172.17.0.2:9300}{dilm}{ml.machine_memory=7937118208, ml.max_open_jobs=20, xpack.installed=true}" }
        {"type": "server", "timestamp": "2019-11-13T13:41:39,408Z", "level": "DEBUG", "component": "o.e.d.PeerFinder", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "Peer{transportAddress=10.6.175.170:9300, discoveryNode=null, peersRequestInFlight=false} connection failed",
        "stacktrace": ["org.elasticsearch.transport.ConnectTransportException: [10.6.175.170][172.17.0.2:9300] local node found",
        "at org.elasticsearch.discovery.HandshakingTransportAddressConnector$1$1$1.innerOnResponse(HandshakingTransportAddressConnector.java:105) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.discovery.HandshakingTransportAddressConnector$1$1$1.innerOnResponse(HandshakingTransportAddressConnector.java:94) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.action.NotifyOnceListener.onResponse(NotifyOnceListener.java:40) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.action.ActionListener.lambda$map$2(ActionListener.java:145) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:466) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:456) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:54) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1110) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.transport.InboundHandler$1.doRun(InboundHandler.java:221) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.4.2.jar:7.4.2]",
        "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
        "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
        "at java.lang.Thread.run(Thread.java:830) [?:?]"] }
        {"type": "server", "timestamp": "2019-11-13T13:41:40,392Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "probing master nodes from cluster state: nodes: \n   {10.6.175.170}{VddPVHVCQn6lYfFLvkpIPQ}{UXGG7RujTD68yX3ByBJlCQ}{172.17.0.2}{172.17.0.2:9300}{dilm}{ml.machine_memory=7937118208, xpack.installed=true, ml.max_open_jobs=20}, local\n" }
        {"type": "server", "timestamp": "2019-11-13T13:41:40,392Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "startProbe(172.17.0.2:9300) not probing local node" }
        {"type": "server", "timestamp": "2019-11-13T13:41:40,393Z", "level": "TRACE", "component": "o.e.d.SeedHostsResolver", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "resolved host [10.6.175.170:9300] to [10.6.175.170:9300]" }
        {"type": "server", "timestamp": "2019-11-13T13:41:40,393Z", "level": "TRACE", "component": "o.e.d.SeedHostsResolver", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "resolved host [10.6.175.184:9300] to [10.6.175.184:9300]" }
        {"type": "server", "timestamp": "2019-11-13T13:41:40,393Z", "level": "TRACE", "component": "o.e.d.SeedHostsResolver", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "resolved host [10.6.175.52:9300] to [10.6.175.52:9300]" }

log output part 2

{"type": "server", "timestamp": "2019-11-13T13:41:40,393Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "probing resolved transport addresses [10.6.175.170:9300, 10.6.175.184:9300, 10.6.175.52:9300]" }
    {"type": "server", "timestamp": "2019-11-13T13:41:40,393Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "Peer{transportAddress=10.6.175.170:9300, discoveryNode=null, peersRequestInFlight=false} attempting connection" }
    {"type": "server", "timestamp": "2019-11-13T13:41:40,393Z", "level": "TRACE", "component": "o.e.d.HandshakingTransportAddressConnector", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "[connectToRemoteMasterNode[10.6.175.170:9300]] opening probe connection" }
    {"type": "server", "timestamp": "2019-11-13T13:41:40,398Z", "level": "TRACE", "component": "o.e.d.HandshakingTransportAddressConnector", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "[connectToRemoteMasterNode[10.6.175.170:9300]] opened probe connection" }
    {"type": "server", "timestamp": "2019-11-13T13:41:40,404Z", "level": "TRACE", "component": "o.e.d.HandshakingTransportAddressConnector", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "[connectToRemoteMasterNode[10.6.175.170:9300]] handshake successful: {10.6.175.170}{VddPVHVCQn6lYfFLvkpIPQ}{UXGG7RujTD68yX3ByBJlCQ}{172.17.0.2}{172.17.0.2:9300}{dilm}{ml.machine_memory=7937118208, ml.max_open_jobs=20, xpack.installed=true}" }
    {"type": "server", "timestamp": "2019-11-13T13:41:40,405Z", "level": "DEBUG", "component": "o.e.d.PeerFinder", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "Peer{transportAddress=10.6.175.170:9300, discoveryNode=null, peersRequestInFlight=false} connection failed",
    "stacktrace": ["org.elasticsearch.transport.ConnectTransportException: [10.6.175.170][172.17.0.2:9300] local node found",
    "at org.elasticsearch.discovery.HandshakingTransportAddressConnector$1$1$1.innerOnResponse(HandshakingTransportAddressConnector.java:105) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.discovery.HandshakingTransportAddressConnector$1$1$1.innerOnResponse(HandshakingTransportAddressConnector.java:94) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.action.NotifyOnceListener.onResponse(NotifyOnceListener.java:40) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.action.ActionListener.lambda$map$2(ActionListener.java:145) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:466) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:456) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:54) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1110) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.transport.InboundHandler$1.doRun(InboundHandler.java:221) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.4.2.jar:7.4.2]",
    "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",        "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
    "at java.lang.Thread.run(Thread.java:830) [?:?]"] }

This node is trying to connect to the other two nodes at addresses 10.6.175.188:9300 and 10.6.175.164:9300 but is not discovering them there. I also note that this node claims its address is 172.17.0.3 which I think is different from what you expect. This looks like a network configuration issue to me.

The IP is a docker IP address. If the node is set to network_node: "host" this will give the IP of the EC2 vs the local container IP.

This won't resolve the underlying issue though. Thats still something I am looking for myself.

Hi @Wayne_Taylor Thanks for you response.

Please update here if you found any solution for this problem.

Hi @DavidTurner Thanks for your response.

after I add these two environment variables in the docker cmd

-e "network.publish_host=10.6.175.170" \
-e "network.bind_host=0.0.0.0" \

I am getting below message

{"type": "server", "timestamp": "2019-11-14T17:14:32,394Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "main-cluster", "node.name": "ip-10-6-175-170.cloud.dev.net", "message": "probing resolved transport addresses [10.6.175.184:9300, 10.6.175.52:9300]" }
{"type": "server", "timestamp": "2019-11-14T17:14:32,438Z", "level": "DEBUG", "component": "o.e.d.PeerFinder", "cluster.name": "main-cluster", "node.name": "ip-10-6-175-170.cloud.dev.net", "message": "Peer{transportAddress=10.6.175.52:9300, discoveryNode=null, peersRequestInFlight=false} connection failed",
"stacktrace": ["org.elasticsearch.transport.ConnectTransportException: [][10.6.175.52:9300] connect_timeout[3s]",
"at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:982) ~[elasticsearch-7.4.2.jar:7.4.2]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) ~[elasticsearch-7.4.2.jar:7.4.2]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
"at java.lang.Thread.run(Thread.java:830) [?:?]"] }
{"type": "server", "timestamp": "2019-11-14T17:14:32,438Z", "level": "DEBUG", "component": "o.e.d.PeerFinder", "cluster.name": "main-cluster", "node.name": "ip-10-6-175-170.cloud.dev.net", "message": "Peer{transportAddress=10.6.175.184:9300, discoveryNode=null, peersRequestInFlight=false} connection failed",
"stacktrace": ["org.elasticsearch.transport.ConnectTransportException: [][10.6.175.184:9300] connect_timeout[3s]",
"at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:982) ~[elasticsearch-7.4.2.jar:7.4.2]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) ~[elasticsearch-7.4.2.jar:7.4.2]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
"at java.lang.Thread.run(Thread.java:830) [?:?]"] }

Here is the full docker cmd details

docker run \
-p 9200:9200 \
-p 9300:9300 \
-h es-master1 \
--name es-master1 \
-v es-master1:/usr/share/elasticsearch/data \
-e "ES_JAVA_OPTS=-Xms4g -Xmx4g" \
-e "node.master=true" \
-e "node.ingest=true" \
-e "node.data=true" \
-e "node.name=ip-10-6-175-170.cloud.dev.net" \
-e "cluster.name=main-cluster" \
-e "network.host=10.6.175.170" \
-e "network.publish_host=10.6.175.170" \
-e "network.bind_host=0.0.0.0" \
-e "cluster.initial_master_nodes=ip-10-6-175-170.cloud.dev.net,ip-10-6-175-184.cloud.dev.net,ip-10-6-175-52.cloud.dev.net" \
-e "discovery.seed_hosts=10.6.175.170:9300,10.6.175.184:9300,10.6.175.52:9300" \
-e "discovery.zen.minimum_master_nodes=2" \
-e "xpack.security.enabled=false" \
-e "xpack.monitoring.enabled=true" \
-e "logger.org.elasticsearch.cluster.coordination.ClusterBootstrapService=TRACE" \
-e "logger.org.elasticsearch.discovery=TRACE" \
artifactory.xyz.com/elasticsearch/elasticsearch:7.4.2

Again, this looks like a networking issue. The logs show that this node is attempting to connect to the addresses shown, and those attempts are timing out.

Hi @DavidTurner,

Finally we made the elasticsearch cluster setup working .

The issue is due to the security group set for ec2 instance not exposing 9300 port at instance level.

Thanks for your support.