Also since host 192.168.1.167 has node.master set to false there is no reason to specify it in discovery.zen.ping.unicast.hosts. If you only have two nodes, I strongly suggest to set node.master to true on both nodes.
No, with two nodes, 1 is sufficient, because if one node was to go down, the cluster would get red immediately because it'd need two master-eligible nodes to be present in order to make an election.
However, also note that with two master-eligible nodes, you run the risk of having a split brain situation. For instance if the network gets partitioned, each master could decide to create their own cluster and that's bad, too.
That's why you need to have at least 3 master-eligible nodes (+ discovery.zen.minimum_master_nodes: 2) in order to be certain that your cluster will stay healthy even if one master goes down.
The cluster being red just indicates that primary shard is offline. Using the default 5 primary shards and 1 replica, each of the two nodes would get 5 shards (10 total). A primary shard and its replica (copy) will never be assigned to the same node. This means if one node is removed, the cluster would be yellow (missing/ unassigned replica) because the node that is still online could have all five primary shards allocated. This seems great, but in the reality of distributed computing - this is bad. Let’s say each of the two nodes are connected to different switches. On switch A and switch B you have clients actively writing data. If the link between the two switches breaks, the two elasticsearch nodes cannot communicate with each other (where the clients writing data could still talk to the local node on the same switch). Since you only require 1 master node for a valid cluster, each node could become an operational cluster (just yellow due to unassigned replicas). Before the network interruption, the cluster was working as intended. Once interrupted, you now have allowed the cluster to split and diverge to two parallel timelines. Each timeline had new data writing and updating the cluster. When the network link is repaired, which timeline should be used as the correct / current state?
@jpcarey you're right according to the formula, but if you set that setting two 2 and one node goes down, no master election can take place and the cluster cannot operate, right?
Which means that operating a two-node cluster with two master-eligible nodes cannot tolerate any single node failure, hence why three master-eligible node is the minimum for a solid production cluster
For the sake of resiliency, I'd even venture further and say that there should be a bootstrap check that requires at least 3 master-eligible nodes (+ discovery.zen.minimum_master_nodes: 2) when starting in production mode.
With 2 nodes you can not have a fully highly available setup. If one of the nodes is missing or partitioned off, both nodes need to stop accepting writes in order to avoid data loss. This mens that the formula need to be followed. You will however be able to serve reads based on the local data.
In the case you only have 2 nodes, it may actually seem be better to have only one of the nodes configured as master eligible as at least this node can continue operating fully if the other node fails, but this can make recovery more difficult if the master eligible node is the one that fails.
If you want a highly available writes and a more resilient system, you need a minimum of 3 master eligible nodes, which is what we always recommend for production.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.