Data node cannot find master node

Miao · September 23, 2019, 10:10am

I have two Windows VMs hosted on Azure. Both have ElasticSearch 7.x installed. One of them is designated as the master node, and the other a data node.

Here are the contents of the elasticsearch.yml file on the master node:

# ---------------------------------- Cluster -----------------------------------

cluster.name: xr

# ------------------------------------ Node ------------------------------------

node.name: xr-master-node
node.master: true
node.data: true 

# ---------------------------------- Network -----------------------------------

network.host: [_local_, _site_]

# --------------------------------- Discovery ----------------------------------

cluster.initial_master_nodes:  xr-master-node

And on the data node:

# ---------------------------------- Cluster -----------------------------------

cluster.name: xr

# ------------------------------------ Node ------------------------------------

node.name: xr-data-node-1
node.master: false
node.data: true

# ---------------------------------- Network -----------------------------------

network.host: [_local_, _site_]

# --------------------------------- Discovery ----------------------------------

discovery.seed_hosts: "10.0.1.4" # This is the private IP address of the master node
cluster.initial_master_nodes: xr-master-node

However, every time I try to start ElasticSearch on the data node, I see this error message, even though ElasticSearch is already running on the master node:

[2019-09-19T10:49:07,567][WARN ][o.e.c.c.ClusterFormationFailureHelper] [xr-data-node-1] master not discovered yet: have discovered [{xr-data-node-1}{B3YtyECXTAC1vw1rfzYGRw}{mfdrFCMNRP-SE2d6XRAN9g}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}]; discovery will continue using [10.0.1.4:9300] from hosts providers and [] from last-known cluster state; node term 0, last-accepted version 0 in term 0

On my master node, when I run netstat -a, I am able to confirm that it is listening on 10.0.1.4:9300.

What am I doing wrong?

DavidTurner · September 23, 2019, 11:07am

Does your network security group (NSG) configuration permit xr-data-node-1 to connect to xr-master-node on port 9300?

Miao · September 23, 2019, 11:14am

Thank you for your reply @DavidTurner . Yes, I did add an inbound security rule to allow connections on Port 9300:

Please let me know if I configured anything wrong.

DavidTurner · September 23, 2019, 11:56am

That doesn't look right, although I am not very familiar with Azure so maybe there's some subtlety I'm missing. I would expect both source and destination addresses to be 10.0.1.0/24, and the source port range should be Any.

Edit: also the protocol could reasonably be TCP since that's all that Elasticsearch uses.

Miao · September 23, 2019, 12:07pm

Thanks for your reply. I have adjusted the settings as you suggested, but it didn't work, unfortunately. Do you know of anyone else in the Elastic Team who might be able to help?

DavidTurner · September 23, 2019, 3:47pm

Are you sure it's giving exactly the same message as before? Can you add the following line to the data node's elasticsearch.yml file, then restart it, and then share the first few minutes of logs here in their entirety?

logger.org.elasticsearch.discovery: TRACE

Miao · September 24, 2019, 7:44am

Thank you for your reply, @DavidTurner. I will split my reply into two separate posts, due to the character limit. Sorry for the text dump, but here are the logs (after plugins are loaded):

[2019-09-24T07:32:12,690][DEBUG][o.e.d.z.ElectMasterService] [xr-data-node-1] using minimum_master_nodes [-1]
[2019-09-24T07:32:29,816][INFO ][o.e.x.s.a.s.FileRolesStore] [xr-data-node-1] parsed [0] roles from file [C:\Users\xr\Downloads\elasticsearch-7.3.2-windows-x86_64\elasticsearch-7.3.2\config\roles.yml]
[2019-09-24T07:32:37,207][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [xr-data-node-1] [controller/3276] [Main.cc@110] controller (64 bit): Version 7.3.2 (Build 265429af874fbe) Copyright (c) 2019 Elasticsearch BV
[2019-09-24T07:32:42,379][DEBUG][o.e.a.ActionModule       ] [xr-data-node-1] Using REST wrapper from plugin org.elasticsearch.xpack.security.Security
[2019-09-24T07:32:42,926][DEBUG][o.e.d.SettingsBasedSeedHostsProvider] [xr-data-node-1] using initial hosts [10.0.1.4:9300]
[2019-09-24T07:32:42,957][INFO ][o.e.d.DiscoveryModule    ] [xr-data-node-1] using discovery type [zen] and seed hosts providers [settings]
[2019-09-24T07:32:45,223][INFO ][o.e.n.Node               ] [xr-data-node-1] initialized
[2019-09-24T07:32:45,223][INFO ][o.e.n.Node               ] [xr-data-node-1] starting ...
[2019-09-24T07:32:50,660][INFO ][o.e.t.TransportService   ] [xr-data-node-1] publish_address {10.0.1.5:9300}, bound_addresses {127.0.0.1:9300}, {[::1]:9300}, {10.0.1.5:9300}
[2019-09-24T07:32:50,660][INFO ][o.e.b.BootstrapChecks    ] [xr-data-node-1] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-09-24T07:32:50,676][DEBUG][o.e.d.SeedHostsResolver  ] [xr-data-node-1] using max_concurrent_resolvers [10], resolver timeout [5s]
[2019-09-24T07:32:50,676][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] activating with nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:32:50,676][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:32:56,645][TRACE][o.e.d.SeedHostsResolver  ] [xr-data-node-1] resolved host [10.0.1.4:9300] to [10.0.1.4:9300]
[2019-09-24T07:32:56,661][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:32:56,661][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing resolved transport addresses [10.0.1.4:9300]
[2019-09-24T07:32:56,661][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] Peer{transportAddress=10.0.1.4:9300, discoveryNode=null, peersRequestInFlight=false} attempting connection
[2019-09-24T07:32:56,661][TRACE][o.e.d.HandshakingTransportAddressConnector] [xr-data-node-1] [connectToRemoteMasterNode[10.0.1.4:9300]] opening probe connection
[2019-09-24T07:32:57,682][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:32:57,682][TRACE][o.e.d.SeedHostsResolver  ] [xr-data-node-1] resolved host [10.0.1.4:9300] to [10.0.1.4:9300]
[2019-09-24T07:32:57,682][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing resolved transport addresses [10.0.1.4:9300]
[2019-09-24T07:32:58,709][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:32:58,709][TRACE][o.e.d.SeedHostsResolver  ] [xr-data-node-1] resolved host [10.0.1.4:9300] to [10.0.1.4:9300]
[2019-09-24T07:32:58,709][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing resolved transport addresses [10.0.1.4:9300]
[2019-09-24T07:32:59,721][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

Miao · September 24, 2019, 7:44am

Continued:

[2019-09-24T07:32:59,721][TRACE][o.e.d.SeedHostsResolver  ] [xr-data-node-1] resolved host [10.0.1.4:9300] to [10.0.1.4:9300]
[2019-09-24T07:32:59,721][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing resolved transport addresses [10.0.1.4:9300]
[2019-09-24T07:32:59,737][DEBUG][o.e.d.PeerFinder         ] [xr-data-node-1] Peer{transportAddress=10.0.1.4:9300, discoveryNode=null, peersRequestInFlight=false} connection failed
org.elasticsearch.transport.ConnectTransportException: [][10.0.1.4:9300] connect_timeout[3s]
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:963) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688) ~[elasticsearch-7.3.2.jar:7.3.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:835) [?:?]
[2019-09-24T07:33:00,702][WARN ][o.e.c.c.ClusterFormationFailureHelper] [xr-data-node-1] master not discovered yet: have discovered [{xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}]; discovery will continue using [10.0.1.4:9300] from hosts providers and [] from last-known cluster state; node term 0, last-accepted version 0 in term 0
[2019-09-24T07:33:00,733][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:33:00,733][TRACE][o.e.d.SeedHostsResolver  ] [xr-data-node-1] resolved host [10.0.1.4:9300] to [10.0.1.4:9300]
[2019-09-24T07:33:00,733][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing resolved transport addresses [10.0.1.4:9300]
[2019-09-24T07:33:00,733][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] Peer{transportAddress=10.0.1.4:9300, discoveryNode=null, peersRequestInFlight=false} attempting connection
[2019-09-24T07:33:00,733][TRACE][o.e.d.HandshakingTransportAddressConnector] [xr-data-node-1] [connectToRemoteMasterNode[10.0.1.4:9300]] opening probe connection
[2019-09-24T07:33:01,750][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:33:01,750][TRACE][o.e.d.SeedHostsResolver  ] [xr-data-node-1] resolved host [10.0.1.4:9300] to [10.0.1.4:9300]
[2019-09-24T07:33:01,750][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing resolved transport addresses [10.0.1.4:9300]
[2019-09-24T07:33:02,768][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:33:02,768][TRACE][o.e.d.SeedHostsResolver  ] [xr-data-node-1] resolved host [10.0.1.4:9300] to [10.0.1.4:9300]
[2019-09-24T07:33:02,768][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing resolved transport addresses [10.0.1.4:9300]
[2019-09-24T07:33:03,764][DEBUG][o.e.d.PeerFinder         ] [xr-data-node-1] Peer{transportAddress=10.0.1.4:9300, discoveryNode=null, peersRequestInFlight=false} connection failed
org.elasticsearch.transport.ConnectTransportException: [][10.0.1.4:9300] connect_timeout[3s]
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:963) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688) ~[elasticsearch-7.3.2.jar:7.3.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:835) [?:?]
[2019-09-24T07:33:03,779][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:33:03,779][TRACE][o.e.d.SeedHostsResolver  ] [xr-data-node-1] resolved host [10.0.1.4:9300] to [10.0.1.4:9300]
[2019-09-24T07:33:03,779][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing resolved transport addresses [10.0.1.4:9300]
[2019-09-24T07:33:03,779][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] Peer{transportAddress=10.0.1.4:9300, discoveryNode=null, peersRequestInFlight=false} attempting connection
[2019-09-24T07:33:03,779][TRACE][o.e.d.HandshakingTransportAddressConnector] [xr-data-node-1] [connectToRemoteMasterNode[10.0.1.4:9300]] opening probe connection
[2019-09-24T07:33:04,792][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:33:04,792][TRACE][o.e.d.SeedHostsResolver  ] [xr-data-node-1] resolved host [10.0.1.4:9300] to [10.0.1.4:9300]
[2019-09-24T07:33:04,808][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing resolved transport addresses [10.0.1.4:9300]
[2019-09-24T07:33:05,819][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

The same messages are then printed repeatedly without end -- specifically those about probing transport addresses.

DavidTurner · September 24, 2019, 8:15am

Thanks @Miao, that's helpful. This message is telling us there's still a basic connectivity problem preventing this node from connecting to 10.0.1.4:9300. Can you connect to that address from this node using something other than Elasticsearch? E.g. curl http://10.0.1.4:9300/ should return immediately with the message This is not an HTTP port.

Can you share your current NSG config? Are you sure that this config is being applied to both nodes?

Miao · September 24, 2019, 8:24am

Thank you for your reply. I ran the command curl http://10.0.1.4:9300, and received this response after a short wait:

curl: (7) Failed to connect to 10.0.1.4 port 9300: Timed out

Here is a screenshot of the inbound port rules for my NSG (I know taking a screenshot is not ideal, so I apologise):

Here are the outbound port rules for my NSG:

Here are the network interfaces belonging to the same NSG:

chandu5565 · September 24, 2019, 8:43am

Hi Miao,

Earlier I also used to receive the same error could you please confirm whether you have enabled this parameter xpack.security.enabled: true.

Miao · September 24, 2019, 8:45am

Thank you for your reply, @chandu5565. I did not specify the xpack.security.enabled parameter inside elasticsearch.yml, and its default value is false, according to Elasticsearch documentation.

Miao · September 24, 2019, 10:27am

@DavidTurner It turned out that the firewall settings on my master node VM did not allow incoming connections to Port 9300, so I have fixed that. However, my data node still cannot discover the master node.

Interestingly, initially, it said that both handshake and full connection were successful:

[2019-09-24T10:16:09,739][TRACE][o.e.d.HandshakingTransportAddressConnector] [xr-data-node-1] [connectToRemoteMasterNode[10.0.1.4:9300]] handshake successful: {xr-master-node}{MoQwDgZWTCy_vOT5Wm6ttw}{Z0FJSnz9RlGiVXw5mZ-6tA}{10.0.1.4}{10.0.1.4:9300}{dim}{ml.machine_memory=8589463552, ml.max_open_jobs=20, xpack.installed=true}
[2019-09-24T10:16:09,848][TRACE][o.e.d.HandshakingTransportAddressConnector] [xr-data-node-1] [connectToRemoteMasterNode[10.0.1.4:9300]] full connection successful: {xr-master-node}{MoQwDgZWTCy_vOT5Wm6ttw}{Z0FJSnz9RlGiVXw5mZ-6tA}{10.0.1.4}{10.0.1.4:9300}{dim}{ml.machine_memory=8589463552, ml.max_open_jobs=20, xpack.installed=true}
[2019-09-24T10:16:09,864][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] Peer{transportAddress=10.0.1.4:9300, discoveryNode={xr-master-node}{MoQwDgZWTCy_vOT5Wm6ttw}{Z0FJSnz9RlGiVXw5mZ-6tA}{10.0.1.4}{10.0.1.4:9300}{dim}{ml.machine_memory=8589463552, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=false} requesting peers
[2019-09-24T10:16:09,911][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] Peer{transportAddress=10.0.1.4:9300, discoveryNode={xr-master-node}{MoQwDgZWTCy_vOT5Wm6ttw}{Z0FJSnz9RlGiVXw5mZ-6tA}{10.0.1.4}{10.0.1.4:9300}{dim}{ml.machine_memory=8589463552, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=true} received

Nonetheless it failed to discover the master node. (More precisely, it discovered the master node, but did not recognise that it was master-eligible.)

[2019-09-24T10:16:13,800][WARN ][o.e.c.c.ClusterFormationFailureHelper] [xr-data-node-1] master not discovered yet: have discovered [{xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{P-1woOzkRjqTXaOIA3cAdg}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, {xr-master-node}{MoQwDgZWTCy_vOT5Wm6ttw}{Z0FJSnz9RlGiVXw5mZ-6tA}{10.0.1.4}{10.0.1.4:9300}{dim}{ml.machine_memory=8589463552, ml.max_open_jobs=20, xpack.installed=true}]; discovery will continue using [10.0.1.4:9300] from hosts providers and [] from last-known cluster state; node term 30, last-accepted version 0 in term 0

Eventually, it timed out:

[2019-09-24T10:16:13,800][WARN ][o.e.c.c.ClusterFormationFailureHelper] [xr-data-node-1] master not discovered yet: have discovered [{xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{P-1woOzkRjqTXaOIA3cAdg}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, {xr-master-node}{MoQwDgZWTCy_vOT5Wm6ttw}{Z0FJSnz9RlGiVXw5mZ-6tA}{10.0.1.4}{10.0.1.4:9300}{dim}{ml.machine_memory=8589463552, ml.max_open_jobs=20, xpack.installed=true}]; discovery will continue using [10.0.1.4:9300] from hosts providers and [] from last-known cluster state; node term 30, last-accepted version 0 in term 0

Miao · September 24, 2019, 10:27am

Continued:

Finally, an exception was thrown:

[2019-09-24T10:16:30,965][INFO ][o.e.c.c.JoinHelper       ] [xr-data-node-1] failed to join {xr-master-node}{MoQwDgZWTCy_vOT5Wm6ttw}{Z0FJSnz9RlGiVXw5mZ-6tA}{10.0.1.4}{10.0.1.4:9300}{dim}{ml.machine_memory=8589463552, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{P-1woOzkRjqTXaOIA3cAdg}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional[Join{term=30, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{P-1woOzkRjqTXaOIA3cAdg}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, targetNode={xr-master-node}{MoQwDgZWTCy_vOT5Wm6ttw}{Z0FJSnz9RlGiVXw5mZ-6tA}{10.0.1.4}{10.0.1.4:9300}{dim}{ml.machine_memory=8589463552, ml.max_open_jobs=20, xpack.installed=true}}]}
org.elasticsearch.transport.RemoteTransportException: [xr-master-node][10.0.1.4:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.transport.ConnectTransportException: [xr-data-node-1][10.0.1.5:9300] connect_exception
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:972) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$3(ActionListener.java:161) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:42) ~[elasticsearch-core-7.3.2.jar:7.3.2]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2159) ~[?:?]
        at org.elasticsearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:57) ~[elasticsearch-core-7.3.2.jar:7.3.2]
        at org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:68) ~[transport-netty4-client-7.3.2.jar:7.3.2]
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:502) ~[netty-common-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:495) ~[netty-common-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:474) ~[netty-common-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:415) ~[netty-common-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:540) ~[netty-common-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:533) ~[netty-common-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:114) ~[netty-common-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327) ~[netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343) ~[netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:670) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) [netty-common-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.36.Final.jar:4.1.36.Final]
        at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: java.io.IOException: Connection timed out: no further information: 10.0.1.5/10.0.1.5:9300
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:670) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) ~[?:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
        at java.lang.Thread.run(Thread.java:835) ~[?:?]
Caused by: java.io.IOException: Connection timed out: no further information
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:670) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) ~[?:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
        at java.lang.Thread.run(Thread.java:835) ~[?:?]

DavidTurner · September 24, 2019, 11:28am

Well done on finding the firewall issue!

The exception you are seeing is to do with connectivity in the opposite direction: the master is trying to connect back to the data node and this connection is now timing out. Elasticsearch needs every node to be able to connect to every other node.

Miao · September 24, 2019, 11:31am

@DavidTurner Thank you for your prompt reply! I have also ensured that the firewall on my data node allows incoming connections on Port 9300, so the failure to form a cluster is puzzling to me.

DavidTurner · September 24, 2019, 11:42am

I think there's some other network-level filtering going on somewhere. I would suggest the same experiment using curl as before, except in the opposite direction.

Miao · September 24, 2019, 12:11pm

@DavidTurner I currently see the following messages on my data node while trying to form a cluster:

[2019-09-24T12:09:43,575][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] not active
[2019-09-24T12:09:44,842][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] deactivating and setting leader to {xr-master-node}{MoQwDgZWTCy_vOT5Wm6ttw}{Z0FJSnz9RlGiVXw5mZ-6tA}{10.0.1.4}{10.0.1.4:9300}{dim}{ml.machine_memory=8589463552, ml.max_open_jobs=20, xpack.installed=true}
[2019-09-24T12:09:44,856][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] not active

These messages are printed repeatedly, without end.

What does it mean to say that xr-data-node-1 is not active? Is it something I should fix?

DavidTurner · September 24, 2019, 12:15pm

No, these are TRACE-level logs, intended to be read alongside the source code for low-level debugging only. They don't indicate anything that needs fixing.

Miao · September 24, 2019, 12:18pm

@DavidTurner Thank you for your prompt reply! So, presumably, I have successfully created a cluster?

My apologies for the silly questions, and thank you very much for your help so far -- you have been really patient and helpful.

Topic		Replies	Views
Data node unable to find master node Elasticsearch	1	435	October 17, 2019
Data node can't find the master on elasticsearch cluster Elasticsearch	2	1845	November 30, 2020
Data node can't find master node ,but master can find data node Elasticsearch	16	3361	July 5, 2017
Master node and data node configuration Elasticsearch	7	6753	June 22, 2018
Windows two Node Cluster stuck on tring to determine master Elasticsearch	12	445	February 8, 2024

Data node cannot find master node

Related topics