Thanks Shay for the extensive debugging yesterday on IRC.
Everything seems much clearer now, but I think I found a situation
where the cluster joining
does not work as expected.
I got the following situation during startup:
minimum_master_nodes set to 3
master:true on 5 nodes,
master:false on the other nodes.
unicast discovery with hosts set to the 5 nodes that can become master.
During the initial cluster everything works fine, a master is elected
and the other nodes join the cluster.
If I remove a non-master node from cluster, then try to join again the
node does not join anymore.
It only gets a ping response from the elected master and does not join
the cluster. I suspect its because minimum_master_nodes is
set to 3 and the unelected masters are somehow not responding to the ping.
If I restart another node, the first node gets a ping response from
the master (with master=true, [master=Cerberus...] and from the other
node that tries to join the cluster (with master=false, master
[null]). Still not joining any cluster.
Only if I set minimum_master_nodes to 1 the node joins the cluster,
but I thought that I wanted that the node
can only join a cluster with 3 master nodes, to prevent split-brains.
Or does it already prevent split-brains if I
set minimum_master_nodes to 3 for the master:true nodes, and
minimum_master_nodes to 1 for the master:false nodes?
The only thing I suspect is that if I restart one of the 5 master:true
nodes, that one also cannot join the cluster again.