Hi !
At first, here's my following architecture :
- 1 Docker swarm cluster running 2 Elasticsearch 6.4.2 nodes without troubles. They're called isbg01 and isbg02. IP : 10.11.0.10
- 1 Docker no-swarm in the same datacentre but on a different VM, running 1 Elasticsearch 6.4.2, isbg03. IP : 10.11.0.12
All ports are open (iptables INPUT/OUTPUT/FORWARD on accept).
isbg01 and isbg02 form a cluster and works perfectly well, however, when I try to add isbg03 to the cluster, the following message appears in the isbg03 logs :
[2018-10-22T14:42:03,732][WARN ][o.e.d.z.ZenDiscovery ] [isbg03] failed to connect to master [{isbg02}{CLIuWm25TrC6lsNNRL2-0w}{Vvh09X8ISHivXNjSENCJKw}{10.0.2.21}{10.0.2.21:9300}{ml.machine_memory=5182058496, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], retrying...
org.elasticsearch.transport.ConnectTransportException: [isbg02][10.0.2.21:9300] general node connection failure
at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:688) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:542) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:329) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:316) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.discovery.zen.ZenDiscovery.joinElectedMaster(ZenDiscovery.java:507) [elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:475) [elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.discovery.zen.ZenDiscovery.access$2500(ZenDiscovery.java:88) [elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1245) [elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [elasticsearch-6.4.2.jar:6.4.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: java.lang.IllegalStateException: java.lang.InterruptedException
at org.elasticsearch.transport.TcpChannel.awaitConnected(TcpChannel.java:153) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:643) ~[elasticsearch-6.4.2.jar:6.4.2]
... 11 more
Caused by: java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1079) ~[?:?]
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1367) ~[?:?]
at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:234) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:69) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.TcpChannel.awaitConnected(TcpChannel.java:147) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:643) ~[elasticsearch-6.4.2.jar:6.4.2]
... 11 more
Yet there's nothing to be found in isbg02's logs.
To simplify the problem, I've temporary removed isbg01, and slightly modified config files to try to connect isbg02 and isbg03 together.
Here are the configuration files :
cluster.name: isbg
node.name: isbg02
discovery.zen.ping.unicast.hosts: ["10.11.0.12"]
discovery.zen.minimum_master_nodes: 2
path.data: /var/lib/elasticsearch
network.host: 0.0.0.0
.
cluster.name: isbg
node.name: isbg03
discovery.zen.ping.unicast.hosts: ["10.11.0.10:9301"]
network.host: 0.0.0.0
discovery.zen.minimum_master_nodes: 2
Curl-ing nodes from each other works (port 9200 gives me the traditionnal JSON, 9300 for isbg03 and 9301 for isbg02 gives me the "this is not an HTTP port".
I'm running out of ideas here. It seems that the firewall isn't blocking anything and they're running the same ES version.
Any idea on how to solve this ?
Thanks a lot !