Data node cannot find master node

I have two Windows VMs hosted on Azure. Both have ElasticSearch 7.x installed. One of them is designated as the master node, and the other a data node.

Here are the contents of the elasticsearch.yml file on the master node:

# ---------------------------------- Cluster -----------------------------------

cluster.name: xr

# ------------------------------------ Node ------------------------------------

node.name: xr-master-node
node.master: true
node.data: true 

# ---------------------------------- Network -----------------------------------

network.host: [_local_, _site_]

# --------------------------------- Discovery ----------------------------------

cluster.initial_master_nodes:  xr-master-node

And on the data node:

# ---------------------------------- Cluster -----------------------------------

cluster.name: xr

# ------------------------------------ Node ------------------------------------

node.name: xr-data-node-1
node.master: false
node.data: true

# ---------------------------------- Network -----------------------------------

network.host: [_local_, _site_]

# --------------------------------- Discovery ----------------------------------

discovery.seed_hosts: "10.0.1.4" # This is the private IP address of the master node
cluster.initial_master_nodes: xr-master-node

However, every time I try to start ElasticSearch on the data node, I see this error message, even though ElasticSearch is already running on the master node:

[2019-09-19T10:49:07,567][WARN ][o.e.c.c.ClusterFormationFailureHelper] [xr-data-node-1] master not discovered yet: have discovered [{xr-data-node-1}{B3YtyECXTAC1vw1rfzYGRw}{mfdrFCMNRP-SE2d6XRAN9g}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}]; discovery will continue using [10.0.1.4:9300] from hosts providers and [] from last-known cluster state; node term 0, last-accepted version 0 in term 0

On my master node, when I run netstat -a, I am able to confirm that it is listening on 10.0.1.4:9300.

What am I doing wrong?

Does your network security group (NSG) configuration permit xr-data-node-1 to connect to xr-master-node on port 9300?

Thank you for your reply @DavidTurner . Yes, I did add an inbound security rule to allow connections on Port 9300:

Please let me know if I configured anything wrong.

That doesn't look right, although I am not very familiar with Azure so maybe there's some subtlety I'm missing. I would expect both source and destination addresses to be 10.0.1.0/24, and the source port range should be Any.

Edit: also the protocol could reasonably be TCP since that's all that Elasticsearch uses.

Thanks for your reply. I have adjusted the settings as you suggested, but it didn't work, unfortunately. :frowning: Do you know of anyone else in the Elastic Team who might be able to help?

Are you sure it's giving exactly the same message as before? Can you add the following line to the data node's elasticsearch.yml file, then restart it, and then share the first few minutes of logs here in their entirety?

logger.org.elasticsearch.discovery: TRACE

Thank you for your reply, @DavidTurner. I will split my reply into two separate posts, due to the character limit. Sorry for the text dump, but here are the logs (after plugins are loaded):

[2019-09-24T07:32:12,690][DEBUG][o.e.d.z.ElectMasterService] [xr-data-node-1] using minimum_master_nodes [-1]
[2019-09-24T07:32:29,816][INFO ][o.e.x.s.a.s.FileRolesStore] [xr-data-node-1] parsed [0] roles from file [C:\Users\xr\Downloads\elasticsearch-7.3.2-windows-x86_64\elasticsearch-7.3.2\config\roles.yml]
[2019-09-24T07:32:37,207][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [xr-data-node-1] [controller/3276] [Main.cc@110] controller (64 bit): Version 7.3.2 (Build 265429af874fbe) Copyright (c) 2019 Elasticsearch BV
[2019-09-24T07:32:42,379][DEBUG][o.e.a.ActionModule       ] [xr-data-node-1] Using REST wrapper from plugin org.elasticsearch.xpack.security.Security
[2019-09-24T07:32:42,926][DEBUG][o.e.d.SettingsBasedSeedHostsProvider] [xr-data-node-1] using initial hosts [10.0.1.4:9300]
[2019-09-24T07:32:42,957][INFO ][o.e.d.DiscoveryModule    ] [xr-data-node-1] using discovery type [zen] and seed hosts providers [settings]
[2019-09-24T07:32:45,223][INFO ][o.e.n.Node               ] [xr-data-node-1] initialized
[2019-09-24T07:32:45,223][INFO ][o.e.n.Node               ] [xr-data-node-1] starting ...
[2019-09-24T07:32:50,660][INFO ][o.e.t.TransportService   ] [xr-data-node-1] publish_address {10.0.1.5:9300}, bound_addresses {127.0.0.1:9300}, {[::1]:9300}, {10.0.1.5:9300}
[2019-09-24T07:32:50,660][INFO ][o.e.b.BootstrapChecks    ] [xr-data-node-1] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-09-24T07:32:50,676][DEBUG][o.e.d.SeedHostsResolver  ] [xr-data-node-1] using max_concurrent_resolvers [10], resolver timeout [5s]
[2019-09-24T07:32:50,676][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] activating with nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:32:50,676][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:32:56,645][TRACE][o.e.d.SeedHostsResolver  ] [xr-data-node-1] resolved host [10.0.1.4:9300] to [10.0.1.4:9300]
[2019-09-24T07:32:56,661][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:32:56,661][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing resolved transport addresses [10.0.1.4:9300]
[2019-09-24T07:32:56,661][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] Peer{transportAddress=10.0.1.4:9300, discoveryNode=null, peersRequestInFlight=false} attempting connection
[2019-09-24T07:32:56,661][TRACE][o.e.d.HandshakingTransportAddressConnector] [xr-data-node-1] [connectToRemoteMasterNode[10.0.1.4:9300]] opening probe connection
[2019-09-24T07:32:57,682][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:32:57,682][TRACE][o.e.d.SeedHostsResolver  ] [xr-data-node-1] resolved host [10.0.1.4:9300] to [10.0.1.4:9300]
[2019-09-24T07:32:57,682][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing resolved transport addresses [10.0.1.4:9300]
[2019-09-24T07:32:58,709][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:32:58,709][TRACE][o.e.d.SeedHostsResolver  ] [xr-data-node-1] resolved host [10.0.1.4:9300] to [10.0.1.4:9300]
[2019-09-24T07:32:58,709][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing resolved transport addresses [10.0.1.4:9300]
[2019-09-24T07:32:59,721][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

Continued:

[2019-09-24T07:32:59,721][TRACE][o.e.d.SeedHostsResolver  ] [xr-data-node-1] resolved host [10.0.1.4:9300] to [10.0.1.4:9300]
[2019-09-24T07:32:59,721][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing resolved transport addresses [10.0.1.4:9300]
[2019-09-24T07:32:59,737][DEBUG][o.e.d.PeerFinder         ] [xr-data-node-1] Peer{transportAddress=10.0.1.4:9300, discoveryNode=null, peersRequestInFlight=false} connection failed
org.elasticsearch.transport.ConnectTransportException: [][10.0.1.4:9300] connect_timeout[3s]
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:963) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688) ~[elasticsearch-7.3.2.jar:7.3.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:835) [?:?]
[2019-09-24T07:33:00,702][WARN ][o.e.c.c.ClusterFormationFailureHelper] [xr-data-node-1] master not discovered yet: have discovered [{xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}]; discovery will continue using [10.0.1.4:9300] from hosts providers and [] from last-known cluster state; node term 0, last-accepted version 0 in term 0
[2019-09-24T07:33:00,733][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:33:00,733][TRACE][o.e.d.SeedHostsResolver  ] [xr-data-node-1] resolved host [10.0.1.4:9300] to [10.0.1.4:9300]
[2019-09-24T07:33:00,733][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing resolved transport addresses [10.0.1.4:9300]
[2019-09-24T07:33:00,733][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] Peer{transportAddress=10.0.1.4:9300, discoveryNode=null, peersRequestInFlight=false} attempting connection
[2019-09-24T07:33:00,733][TRACE][o.e.d.HandshakingTransportAddressConnector] [xr-data-node-1] [connectToRemoteMasterNode[10.0.1.4:9300]] opening probe connection
[2019-09-24T07:33:01,750][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:33:01,750][TRACE][o.e.d.SeedHostsResolver  ] [xr-data-node-1] resolved host [10.0.1.4:9300] to [10.0.1.4:9300]
[2019-09-24T07:33:01,750][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing resolved transport addresses [10.0.1.4:9300]
[2019-09-24T07:33:02,768][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:33:02,768][TRACE][o.e.d.SeedHostsResolver  ] [xr-data-node-1] resolved host [10.0.1.4:9300] to [10.0.1.4:9300]
[2019-09-24T07:33:02,768][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing resolved transport addresses [10.0.1.4:9300]
[2019-09-24T07:33:03,764][DEBUG][o.e.d.PeerFinder         ] [xr-data-node-1] Peer{transportAddress=10.0.1.4:9300, discoveryNode=null, peersRequestInFlight=false} connection failed
org.elasticsearch.transport.ConnectTransportException: [][10.0.1.4:9300] connect_timeout[3s]
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:963) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688) ~[elasticsearch-7.3.2.jar:7.3.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:835) [?:?]
[2019-09-24T07:33:03,779][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:33:03,779][TRACE][o.e.d.SeedHostsResolver  ] [xr-data-node-1] resolved host [10.0.1.4:9300] to [10.0.1.4:9300]
[2019-09-24T07:33:03,779][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing resolved transport addresses [10.0.1.4:9300]
[2019-09-24T07:33:03,779][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] Peer{transportAddress=10.0.1.4:9300, discoveryNode=null, peersRequestInFlight=false} attempting connection
[2019-09-24T07:33:03,779][TRACE][o.e.d.HandshakingTransportAddressConnector] [xr-data-node-1] [connectToRemoteMasterNode[10.0.1.4:9300]] opening probe connection
[2019-09-24T07:33:04,792][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

[2019-09-24T07:33:04,792][TRACE][o.e.d.SeedHostsResolver  ] [xr-data-node-1] resolved host [10.0.1.4:9300] to [10.0.1.4:9300]
[2019-09-24T07:33:04,808][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing resolved transport addresses [10.0.1.4:9300]
[2019-09-24T07:33:05,819][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] probing master nodes from cluster state: nodes:
   {xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{UgzWUcaCT5WvBycgM9Hkxw}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, local

The same messages are then printed repeatedly without end -- specifically those about probing transport addresses.

Thanks @Miao, that's helpful. This message is telling us there's still a basic connectivity problem preventing this node from connecting to 10.0.1.4:9300. Can you connect to that address from this node using something other than Elasticsearch? E.g. curl http://10.0.1.4:9300/ should return immediately with the message This is not an HTTP port.

Can you share your current NSG config? Are you sure that this config is being applied to both nodes?

Thank you for your reply. I ran the command curl http://10.0.1.4:9300, and received this response after a short wait:

curl: (7) Failed to connect to 10.0.1.4 port 9300: Timed out

Here is a screenshot of the inbound port rules for my NSG (I know taking a screenshot is not ideal, so I apologise):

Here are the outbound port rules for my NSG:

Here are the network interfaces belonging to the same NSG:

Hi Miao,

Earlier I also used to receive the same error could you please confirm whether you have enabled this parameter xpack.security.enabled: true.

Thank you for your reply, @chandu5565. I did not specify the xpack.security.enabled parameter inside elasticsearch.yml, and its default value is false, according to Elasticsearch documentation.

@DavidTurner It turned out that the firewall settings on my master node VM did not allow incoming connections to Port 9300, so I have fixed that. However, my data node still cannot discover the master node.

Interestingly, initially, it said that both handshake and full connection were successful:

[2019-09-24T10:16:09,739][TRACE][o.e.d.HandshakingTransportAddressConnector] [xr-data-node-1] [connectToRemoteMasterNode[10.0.1.4:9300]] handshake successful: {xr-master-node}{MoQwDgZWTCy_vOT5Wm6ttw}{Z0FJSnz9RlGiVXw5mZ-6tA}{10.0.1.4}{10.0.1.4:9300}{dim}{ml.machine_memory=8589463552, ml.max_open_jobs=20, xpack.installed=true}
[2019-09-24T10:16:09,848][TRACE][o.e.d.HandshakingTransportAddressConnector] [xr-data-node-1] [connectToRemoteMasterNode[10.0.1.4:9300]] full connection successful: {xr-master-node}{MoQwDgZWTCy_vOT5Wm6ttw}{Z0FJSnz9RlGiVXw5mZ-6tA}{10.0.1.4}{10.0.1.4:9300}{dim}{ml.machine_memory=8589463552, ml.max_open_jobs=20, xpack.installed=true}
[2019-09-24T10:16:09,864][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] Peer{transportAddress=10.0.1.4:9300, discoveryNode={xr-master-node}{MoQwDgZWTCy_vOT5Wm6ttw}{Z0FJSnz9RlGiVXw5mZ-6tA}{10.0.1.4}{10.0.1.4:9300}{dim}{ml.machine_memory=8589463552, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=false} requesting peers
[2019-09-24T10:16:09,911][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] Peer{transportAddress=10.0.1.4:9300, discoveryNode={xr-master-node}{MoQwDgZWTCy_vOT5Wm6ttw}{Z0FJSnz9RlGiVXw5mZ-6tA}{10.0.1.4}{10.0.1.4:9300}{dim}{ml.machine_memory=8589463552, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=true} received 

Nonetheless it failed to discover the master node. (More precisely, it discovered the master node, but did not recognise that it was master-eligible.)

[2019-09-24T10:16:13,800][WARN ][o.e.c.c.ClusterFormationFailureHelper] [xr-data-node-1] master not discovered yet: have discovered [{xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{P-1woOzkRjqTXaOIA3cAdg}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, {xr-master-node}{MoQwDgZWTCy_vOT5Wm6ttw}{Z0FJSnz9RlGiVXw5mZ-6tA}{10.0.1.4}{10.0.1.4:9300}{dim}{ml.machine_memory=8589463552, ml.max_open_jobs=20, xpack.installed=true}]; discovery will continue using [10.0.1.4:9300] from hosts providers and [] from last-known cluster state; node term 30, last-accepted version 0 in term 0

Eventually, it timed out:

[2019-09-24T10:16:13,800][WARN ][o.e.c.c.ClusterFormationFailureHelper] [xr-data-node-1] master not discovered yet: have discovered [{xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{P-1woOzkRjqTXaOIA3cAdg}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, {xr-master-node}{MoQwDgZWTCy_vOT5Wm6ttw}{Z0FJSnz9RlGiVXw5mZ-6tA}{10.0.1.4}{10.0.1.4:9300}{dim}{ml.machine_memory=8589463552, ml.max_open_jobs=20, xpack.installed=true}]; discovery will continue using [10.0.1.4:9300] from hosts providers and [] from last-known cluster state; node term 30, last-accepted version 0 in term 0

Continued:

Finally, an exception was thrown:

[2019-09-24T10:16:30,965][INFO ][o.e.c.c.JoinHelper       ] [xr-data-node-1] failed to join {xr-master-node}{MoQwDgZWTCy_vOT5Wm6ttw}{Z0FJSnz9RlGiVXw5mZ-6tA}{10.0.1.4}{10.0.1.4:9300}{dim}{ml.machine_memory=8589463552, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{P-1woOzkRjqTXaOIA3cAdg}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional[Join{term=30, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={xr-data-node-1}{UNtswUpMTE2OzXD63PBkeQ}{P-1woOzkRjqTXaOIA3cAdg}{10.0.1.5}{10.0.1.5:9300}{di}{ml.machine_memory=3757625344, xpack.installed=true, ml.max_open_jobs=20}, targetNode={xr-master-node}{MoQwDgZWTCy_vOT5Wm6ttw}{Z0FJSnz9RlGiVXw5mZ-6tA}{10.0.1.4}{10.0.1.4:9300}{dim}{ml.machine_memory=8589463552, ml.max_open_jobs=20, xpack.installed=true}}]}
org.elasticsearch.transport.RemoteTransportException: [xr-master-node][10.0.1.4:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.transport.ConnectTransportException: [xr-data-node-1][10.0.1.5:9300] connect_exception
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:972) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$3(ActionListener.java:161) ~[elasticsearch-7.3.2.jar:7.3.2]
        at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:42) ~[elasticsearch-core-7.3.2.jar:7.3.2]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2159) ~[?:?]
        at org.elasticsearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:57) ~[elasticsearch-core-7.3.2.jar:7.3.2]
        at org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:68) ~[transport-netty4-client-7.3.2.jar:7.3.2]
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:502) ~[netty-common-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:495) ~[netty-common-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:474) ~[netty-common-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:415) ~[netty-common-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:540) ~[netty-common-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:533) ~[netty-common-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:114) ~[netty-common-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327) ~[netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343) ~[netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:670) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) [netty-common-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.36.Final.jar:4.1.36.Final]
        at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: java.io.IOException: Connection timed out: no further information: 10.0.1.5/10.0.1.5:9300
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:670) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) ~[?:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
        at java.lang.Thread.run(Thread.java:835) ~[?:?]
Caused by: java.io.IOException: Connection timed out: no further information
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[netty-transport-4.1.36.Final.jar:4.1.36.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:670) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) ~[?:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
        at java.lang.Thread.run(Thread.java:835) ~[?:?]

Well done on finding the firewall issue!

The exception you are seeing is to do with connectivity in the opposite direction: the master is trying to connect back to the data node and this connection is now timing out. Elasticsearch needs every node to be able to connect to every other node.

@DavidTurner Thank you for your prompt reply! I have also ensured that the firewall on my data node allows incoming connections on Port 9300, so the failure to form a cluster is puzzling to me. :frowning:

I think there's some other network-level filtering going on somewhere. I would suggest the same experiment using curl as before, except in the opposite direction.

@DavidTurner I currently see the following messages on my data node while trying to form a cluster:

[2019-09-24T12:09:43,575][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] not active
[2019-09-24T12:09:44,842][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] deactivating and setting leader to {xr-master-node}{MoQwDgZWTCy_vOT5Wm6ttw}{Z0FJSnz9RlGiVXw5mZ-6tA}{10.0.1.4}{10.0.1.4:9300}{dim}{ml.machine_memory=8589463552, ml.max_open_jobs=20, xpack.installed=true}
[2019-09-24T12:09:44,856][TRACE][o.e.d.PeerFinder         ] [xr-data-node-1] not active 

These messages are printed repeatedly, without end.

What does it mean to say that xr-data-node-1 is not active? Is it something I should fix?

No, these are TRACE-level logs, intended to be read alongside the source code for low-level debugging only. They don't indicate anything that needs fixing.

@DavidTurner Thank you for your prompt reply! So, presumably, I have successfully created a cluster?

My apologies for the silly questions, and thank you very much for your help so far -- you have been really patient and helpful.