Elasticsearch cluster setup on 3 nodes

I am trying to setup the elasticsearch cluster on 3 nodes with AWS ELB.

    docker run \
    -p 9200:9200 \
    -p 9300:9300 \
    --name es-master1 \
    -v es-master1:/usr/share/elasticsearch/data \
    -v es-master1_log:/var/log/elasticsearch \
    -e "ES_JAVA_OPTS=-Xms4g -Xmx4g" \
    -e "node.master=true" \
    -e "node.ingest=true" \
    -e "node.data=true" \
    -e "node.name=ip-10-6-175-54.cloud.dev.net" \
    -e "cluster.name=main-cluster" \
    -e "network.host=0.0.0.0" \
    -e "cluster.initial_master_nodes=ip-10-6-175-54.cloud.dev.net,ip-10-6-175-188.cloud.dev.net,ip-10-6-175-164.cloud.dev.net" \
    -e "discovery.seed_hosts=10.6.175.188:9300,10.6.175.164:9300" \
    -e "discovery.zen.minimum_master_nodes=2" \
    -e "xpack.security.enabled=false" \
    -e "xpack.monitoring.enabled=true" \
    artifactory.global.xyz/elasticsearch/elasticsearch:7.4.2

But I am getting the below warning message and cluster is not getting formed.

{"type": "server", "timestamp": "2019-11-12T12:42:30,807Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "main-cluster", "node.name": "ip-10-6-175-54.cloud.dev.net", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [ip-10-6-175-54.cloud.dev.net, ip-10-6-175-188.cloud.dev.net, ip-10-6-175-164.cloud.dev.net] to bootstrap a cluster: have discovered [{ip-10-6-175-54.cloud.dev.net}{9AX2HGjcRkSGbSyfpQxmEQ}{RXc3q5aiSDS18Z5jlXFeGw}{172.17.0.3}{172.17.0.3:9300}{dilm}{ml.machine_memory=7937126400, xpack.installed=true, ml.max_open_jobs=20}]; discovery will continue using [10.6.175.188:9300, 10.6.175.164:9300] from hosts providers and [{ip-10-6-175-54.cloud.dev.net}{9AX2HGjcRkSGbSyfpQxmEQ}{RXc3q5aiSDS18Z5jlXFeGw}{172.17.0.3}{172.17.0.3:9300}{dilm}{ml.machine_memory=7937126400, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }

I am able to do the telnet for other two instances telnet ip 9200
is there any configuration missing. can someone please help me to understand issue here.

Hi Team,
I have additional environment variables for TRACE logging

    docker run \
    -p 9200:9200 \
    -p 9300:9300 \
    -h es-master1 \
    --name es-master1 \
    -v es-master1:/usr/share/elasticsearch/data \
    -e "ES_JAVA_OPTS=-Xms4g -Xmx4g" \
    -e "node.master=true" \
    -e "node.ingest=true" \
    -e "node.data=true" \
    -e "node.name=10.6.175.170" \
    -e "cluster.name=main-cluster" \
    -e "network.host=0.0.0.0" \
    -e "cluster.initial_master_nodes=10.6.175.170,10.6.175.184,10.6.175.52" \
    -e "discovery.seed_hosts=10.6.175.170:9300,10.6.175.184:9300,10.6.175.52:9300" \
    -e "discovery.zen.minimum_master_nodes=2" \
    -e "xpack.security.enabled=false" \
    -e "xpack.monitoring.enabled=true" \
    -e "logger.org.elasticsearch.cluster.coordination.ClusterBootstrapService=TRACE" \
    -e "logger.org.elasticsearch.discovery=TRACE" \
    artifactory.global.xyz/elasticsearch/elasticsearch:7.4.2

and got the below output. part1

and got the below output. I am running elasticsearch as docker container not sure how to take full log. here is the log fragment 

    {"type": "server", "timestamp": "2019-11-13T13:41:39,404Z", "level": "TRACE", "component": "o.e.d.HandshakingTransportAddressConnector", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "[connectToRemoteMasterNode[10.6.175.170:9300]] opened probe connection" }
        {"type": "server", "timestamp": "2019-11-13T13:41:39,408Z", "level": "TRACE", "component": "o.e.d.HandshakingTransportAddressConnector", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "[connectToRemoteMasterNode[10.6.175.170:9300]] handshake successful: {10.6.175.170}{VddPVHVCQn6lYfFLvkpIPQ}{UXGG7RujTD68yX3ByBJlCQ}{172.17.0.2}{172.17.0.2:9300}{dilm}{ml.machine_memory=7937118208, ml.max_open_jobs=20, xpack.installed=true}" }
        {"type": "server", "timestamp": "2019-11-13T13:41:39,408Z", "level": "DEBUG", "component": "o.e.d.PeerFinder", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "Peer{transportAddress=10.6.175.170:9300, discoveryNode=null, peersRequestInFlight=false} connection failed",
        "stacktrace": ["org.elasticsearch.transport.ConnectTransportException: [10.6.175.170][172.17.0.2:9300] local node found",
        "at org.elasticsearch.discovery.HandshakingTransportAddressConnector$1$1$1.innerOnResponse(HandshakingTransportAddressConnector.java:105) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.discovery.HandshakingTransportAddressConnector$1$1$1.innerOnResponse(HandshakingTransportAddressConnector.java:94) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.action.NotifyOnceListener.onResponse(NotifyOnceListener.java:40) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.action.ActionListener.lambda$map$2(ActionListener.java:145) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:466) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:456) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:54) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1110) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.transport.InboundHandler$1.doRun(InboundHandler.java:221) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) [elasticsearch-7.4.2.jar:7.4.2]",
        "at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.4.2.jar:7.4.2]",
        "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
        "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
        "at java.lang.Thread.run(Thread.java:830) [?:?]"] }
        {"type": "server", "timestamp": "2019-11-13T13:41:40,392Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "probing master nodes from cluster state: nodes: \n   {10.6.175.170}{VddPVHVCQn6lYfFLvkpIPQ}{UXGG7RujTD68yX3ByBJlCQ}{172.17.0.2}{172.17.0.2:9300}{dilm}{ml.machine_memory=7937118208, xpack.installed=true, ml.max_open_jobs=20}, local\n" }
        {"type": "server", "timestamp": "2019-11-13T13:41:40,392Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "startProbe(172.17.0.2:9300) not probing local node" }
        {"type": "server", "timestamp": "2019-11-13T13:41:40,393Z", "level": "TRACE", "component": "o.e.d.SeedHostsResolver", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "resolved host [10.6.175.170:9300] to [10.6.175.170:9300]" }
        {"type": "server", "timestamp": "2019-11-13T13:41:40,393Z", "level": "TRACE", "component": "o.e.d.SeedHostsResolver", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "resolved host [10.6.175.184:9300] to [10.6.175.184:9300]" }
        {"type": "server", "timestamp": "2019-11-13T13:41:40,393Z", "level": "TRACE", "component": "o.e.d.SeedHostsResolver", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "resolved host [10.6.175.52:9300] to [10.6.175.52:9300]" }

log output part 2

{"type": "server", "timestamp": "2019-11-13T13:41:40,393Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "probing resolved transport addresses [10.6.175.170:9300, 10.6.175.184:9300, 10.6.175.52:9300]" }
    {"type": "server", "timestamp": "2019-11-13T13:41:40,393Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "Peer{transportAddress=10.6.175.170:9300, discoveryNode=null, peersRequestInFlight=false} attempting connection" }
    {"type": "server", "timestamp": "2019-11-13T13:41:40,393Z", "level": "TRACE", "component": "o.e.d.HandshakingTransportAddressConnector", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "[connectToRemoteMasterNode[10.6.175.170:9300]] opening probe connection" }
    {"type": "server", "timestamp": "2019-11-13T13:41:40,398Z", "level": "TRACE", "component": "o.e.d.HandshakingTransportAddressConnector", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "[connectToRemoteMasterNode[10.6.175.170:9300]] opened probe connection" }
    {"type": "server", "timestamp": "2019-11-13T13:41:40,404Z", "level": "TRACE", "component": "o.e.d.HandshakingTransportAddressConnector", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "[connectToRemoteMasterNode[10.6.175.170:9300]] handshake successful: {10.6.175.170}{VddPVHVCQn6lYfFLvkpIPQ}{UXGG7RujTD68yX3ByBJlCQ}{172.17.0.2}{172.17.0.2:9300}{dilm}{ml.machine_memory=7937118208, ml.max_open_jobs=20, xpack.installed=true}" }
    {"type": "server", "timestamp": "2019-11-13T13:41:40,405Z", "level": "DEBUG", "component": "o.e.d.PeerFinder", "cluster.name": "main-cluster", "node.name": "10.6.175.170", "message": "Peer{transportAddress=10.6.175.170:9300, discoveryNode=null, peersRequestInFlight=false} connection failed",
    "stacktrace": ["org.elasticsearch.transport.ConnectTransportException: [10.6.175.170][172.17.0.2:9300] local node found",
    "at org.elasticsearch.discovery.HandshakingTransportAddressConnector$1$1$1.innerOnResponse(HandshakingTransportAddressConnector.java:105) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.discovery.HandshakingTransportAddressConnector$1$1$1.innerOnResponse(HandshakingTransportAddressConnector.java:94) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.action.NotifyOnceListener.onResponse(NotifyOnceListener.java:40) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.action.ActionListener.lambda$map$2(ActionListener.java:145) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:466) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:456) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:54) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1110) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.transport.InboundHandler$1.doRun(InboundHandler.java:221) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) [elasticsearch-7.4.2.jar:7.4.2]",
    "at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.4.2.jar:7.4.2]",
    "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",        "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
    "at java.lang.Thread.run(Thread.java:830) [?:?]"] }

This node is trying to connect to the other two nodes at addresses 10.6.175.188:9300 and 10.6.175.164:9300 but is not discovering them there. I also note that this node claims its address is 172.17.0.3 which I think is different from what you expect. This looks like a network configuration issue to me.

The IP is a docker IP address. If the node is set to network_node: "host" this will give the IP of the EC2 vs the local container IP.

This won't resolve the underlying issue though. Thats still something I am looking for myself.

Hi @Wayne_Taylor Thanks for you response.

Please update here if you found any solution for this problem.

Hi @DavidTurner Thanks for your response.

after I add these two environment variables in the docker cmd

-e "network.publish_host=10.6.175.170" \
-e "network.bind_host=0.0.0.0" \

I am getting below message

{"type": "server", "timestamp": "2019-11-14T17:14:32,394Z", "level": "TRACE", "component": "o.e.d.PeerFinder", "cluster.name": "main-cluster", "node.name": "ip-10-6-175-170.cloud.dev.net", "message": "probing resolved transport addresses [10.6.175.184:9300, 10.6.175.52:9300]" }
{"type": "server", "timestamp": "2019-11-14T17:14:32,438Z", "level": "DEBUG", "component": "o.e.d.PeerFinder", "cluster.name": "main-cluster", "node.name": "ip-10-6-175-170.cloud.dev.net", "message": "Peer{transportAddress=10.6.175.52:9300, discoveryNode=null, peersRequestInFlight=false} connection failed",
"stacktrace": ["org.elasticsearch.transport.ConnectTransportException: [][10.6.175.52:9300] connect_timeout[3s]",
"at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:982) ~[elasticsearch-7.4.2.jar:7.4.2]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) ~[elasticsearch-7.4.2.jar:7.4.2]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
"at java.lang.Thread.run(Thread.java:830) [?:?]"] }
{"type": "server", "timestamp": "2019-11-14T17:14:32,438Z", "level": "DEBUG", "component": "o.e.d.PeerFinder", "cluster.name": "main-cluster", "node.name": "ip-10-6-175-170.cloud.dev.net", "message": "Peer{transportAddress=10.6.175.184:9300, discoveryNode=null, peersRequestInFlight=false} connection failed",
"stacktrace": ["org.elasticsearch.transport.ConnectTransportException: [][10.6.175.184:9300] connect_timeout[3s]",
"at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:982) ~[elasticsearch-7.4.2.jar:7.4.2]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) ~[elasticsearch-7.4.2.jar:7.4.2]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
"at java.lang.Thread.run(Thread.java:830) [?:?]"] }

Here is the full docker cmd details

docker run \
-p 9200:9200 \
-p 9300:9300 \
-h es-master1 \
--name es-master1 \
-v es-master1:/usr/share/elasticsearch/data \
-e "ES_JAVA_OPTS=-Xms4g -Xmx4g" \
-e "node.master=true" \
-e "node.ingest=true" \
-e "node.data=true" \
-e "node.name=ip-10-6-175-170.cloud.dev.net" \
-e "cluster.name=main-cluster" \
-e "network.host=10.6.175.170" \
-e "network.publish_host=10.6.175.170" \
-e "network.bind_host=0.0.0.0" \
-e "cluster.initial_master_nodes=ip-10-6-175-170.cloud.dev.net,ip-10-6-175-184.cloud.dev.net,ip-10-6-175-52.cloud.dev.net" \
-e "discovery.seed_hosts=10.6.175.170:9300,10.6.175.184:9300,10.6.175.52:9300" \
-e "discovery.zen.minimum_master_nodes=2" \
-e "xpack.security.enabled=false" \
-e "xpack.monitoring.enabled=true" \
-e "logger.org.elasticsearch.cluster.coordination.ClusterBootstrapService=TRACE" \
-e "logger.org.elasticsearch.discovery=TRACE" \
artifactory.xyz.com/elasticsearch/elasticsearch:7.4.2

Again, this looks like a networking issue. The logs show that this node is attempting to connect to the addresses shown, and those attempts are timing out.

Hi @DavidTurner,

Finally we made the elasticsearch cluster setup working .

The issue is due to the security group set for ec2 instance not exposing 9300 port at instance level.

Thanks for your support.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.