Cluster error

I'm trying to create a test cluster

Here is my cluster configuration information:

server esapp node-1
node.master: true true ["", ""]

server esapp node-2
node.master: false true ["", ""]

node-1 Start normal , but node-2 display error information:

[INFO ][o.e.d.z.ZenDiscovery ] [node-2] failed to send join request to master [{node-1}{b1WM0GMgSwap-C6l00F-LA}{N9-uS40OSYaEeUVCfnRgpw}{}{}], reason [RemoteTransportException[[node-1][][internal:discovery/zen/join]]; nested: ConnectTransportException[[node-2][] connect_exception]; nested: IOException[No route to host:]; nested: IOException[No route to host]; ]

How to solve this mistake?
please,help!thank you。

That would suggest you have some networking issues outside of Elasticsearch.
Can you ping each host?

Also since host has node.master set to false there is no reason to specify it in If you only have two nodes, I strongly suggest to set node.master to true on both nodes.

1 Like

they can ping each other

if set node.master to true on both nodes,Should i be set discovery.zen.minimum_master_nodes: 2?

No, with two nodes, 1 is sufficient, because if one node was to go down, the cluster would get red immediately because it'd need two master-eligible nodes to be present in order to make an election.

However, also note that with two master-eligible nodes, you run the risk of having a split brain situation. For instance if the network gets partitioned, each master could decide to create their own cluster and that's bad, too.

That's why you need to have at least 3 master-eligible nodes (+ discovery.zen.minimum_master_nodes: 2) in order to be certain that your cluster will stay healthy even if one master goes down.

OK Thank you for your advice.
but the above problems have not been solved,can you give some advice,please

For two master nodes, it should be set to 2.
(master_eligible_nodes / 2) + 1

The cluster being red just indicates that primary shard is offline. Using the default 5 primary shards and 1 replica, each of the two nodes would get 5 shards (10 total). A primary shard and its replica (copy) will never be assigned to the same node. This means if one node is removed, the cluster would be yellow (missing/ unassigned replica) because the node that is still online could have all five primary shards allocated. This seems great, but in the reality of distributed computing - this is bad. Let’s say each of the two nodes are connected to different switches. On switch A and switch B you have clients actively writing data. If the link between the two switches breaks, the two elasticsearch nodes cannot communicate with each other (where the clients writing data could still talk to the local node on the same switch). Since you only require 1 master node for a valid cluster, each node could become an operational cluster (just yellow due to unassigned replicas). Before the network interruption, the cluster was working as intended. Once interrupted, you now have allowed the cluster to split and diverge to two parallel timelines. Each timeline had new data writing and updating the cluster. When the network link is repaired, which timeline should be used as the correct / current state?

@jpcarey you're right according to the formula, but if you set that setting two 2 and one node goes down, no master election can take place and the cluster cannot operate, right?

Which means that operating a two-node cluster with two master-eligible nodes cannot tolerate any single node failure, hence why three master-eligible node is the minimum for a solid production cluster

For the sake of resiliency, I'd even venture further and say that there should be a bootstrap check that requires at least 3 master-eligible nodes (+ discovery.zen.minimum_master_nodes: 2) when starting in production mode.

With 2 nodes you can not have a fully highly available setup. If one of the nodes is missing or partitioned off, both nodes need to stop accepting writes in order to avoid data loss. This mens that the formula need to be followed. You will however be able to serve reads based on the local data.

In the case you only have 2 nodes, it may actually seem be better to have only one of the nodes configured as master eligible as at least this node can continue operating fully if the other node fails, but this can make recovery more difficult if the master eligible node is the one that fails.

If you want a highly available writes and a more resilient system, you need a minimum of 3 master eligible nodes, which is what we always recommend for production.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.