Minimum_master_nodes prevent nodes from joining the cluster


(Michel Conrad) #1

Hi,
Thanks Shay for the extensive debugging yesterday on IRC.
Everything seems much clearer now, but I think I found a situation
where the cluster joining
does not work as expected.

I got the following situation during startup:

minimum_master_nodes set to 3
master:true on 5 nodes,
master:false on the other nodes.
unicast discovery with hosts set to the 5 nodes that can become master.

During the initial cluster everything works fine, a master is elected
and the other nodes join the cluster.

If I remove a non-master node from cluster, then try to join again the
node does not join anymore.
It only gets a ping response from the elected master and does not join
the cluster. I suspect its because minimum_master_nodes is
set to 3 and the unelected masters are somehow not responding to the ping.

If I restart another node, the first node gets a ping response from
the master (with master=true, [master=Cerberus...] and from the other
node that tries to join the cluster (with master=false, master
[null]). Still not joining any cluster.

Only if I set minimum_master_nodes to 1 the node joins the cluster,
but I thought that I wanted that the node
can only join a cluster with 3 master nodes, to prevent split-brains.
Or does it already prevent split-brains if I
set minimum_master_nodes to 3 for the master:true nodes, and
minimum_master_nodes to 1 for the master:false nodes?
The only thing I suspect is that if I restart one of the 5 master:true
nodes, that one also cannot join the cluster again.

Best,
Michel


(Michel Conrad) #2

Hi,
changing the configuration file solved the joining issue:

When the hosts are comma separated and there is a whitespace after the
comma, the debug message prints correctly:
[DEBUG][discovery.zen.ping.unicast] [Jackson Arvad] using initial
hosts [192.168.5.1[9300], 192.168.5.2[9300], 192.168.5.3[9300],
192.168.5.4[9300], 192.168.5.5[9300]]

But only the first host is being used, the other four give an
java.nio.channels.UnresolvedAddressException on the netty layer.

I think it would be helpful if the addresses would be trimmed before
being passed to netty or if it would be possible to give a warning in
order to avoid similar errors in the future.

Regards,
Michel

On Tue, Aug 2, 2011 at 10:52 AM, Michel Conrad
michel.conrad@trendiction.com wrote:

Hi,
Thanks Shay for the extensive debugging yesterday on IRC.
Everything seems much clearer now, but I think I found a situation
where the cluster joining
does not work as expected.

I got the following situation during startup:

minimum_master_nodes set to 3
master:true on 5 nodes,
master:false on the other nodes.
unicast discovery with hosts set to the 5 nodes that can become master.

During the initial cluster everything works fine, a master is elected
and the other nodes join the cluster.

If I remove a non-master node from cluster, then try to join again the
node does not join anymore.
It only gets a ping response from the elected master and does not join
the cluster. I suspect its because minimum_master_nodes is
set to 3 and the unelected masters are somehow not responding to the ping.

If I restart another node, the first node gets a ping response from
the master (with master=true, [master=Cerberus...] and from the other
node that tries to join the cluster (with master=false, master
[null]). Still not joining any cluster.

Only if I set minimum_master_nodes to 1 the node joins the cluster,
but I thought that I wanted that the node
can only join a cluster with 3 master nodes, to prevent split-brains.
Or does it already prevent split-brains if I
set minimum_master_nodes to 3 for the master:true nodes, and
minimum_master_nodes to 1 for the master:false nodes?
The only thing I suspect is that if I restart one of the 5 master:true
nodes, that one also cannot join the cluster again.

Best,
Michel


(Shay Banon) #3

:).., that explains a lot of the things we saw yesterday then..., I opened
an issue: https://github.com/elasticsearch/elasticsearch/issues/1193.

On Tue, Aug 2, 2011 at 1:14 PM, Michel Conrad <michel.conrad@trendiction.com

wrote:

Hi,
changing the configuration file solved the joining issue:

When the hosts are comma separated and there is a whitespace after the
comma, the debug message prints correctly:
[DEBUG][discovery.zen.ping.unicast] [Jackson Arvad] using initial
hosts [192.168.5.1[9300], 192.168.5.2[9300], 192.168.5.3[9300],
192.168.5.4[9300], 192.168.5.5[9300]]

But only the first host is being used, the other four give an
java.nio.channels.UnresolvedAddressException on the netty layer.

I think it would be helpful if the addresses would be trimmed before
being passed to netty or if it would be possible to give a warning in
order to avoid similar errors in the future.

Regards,
Michel

On Tue, Aug 2, 2011 at 10:52 AM, Michel Conrad
michel.conrad@trendiction.com wrote:

Hi,
Thanks Shay for the extensive debugging yesterday on IRC.
Everything seems much clearer now, but I think I found a situation
where the cluster joining
does not work as expected.

I got the following situation during startup:

minimum_master_nodes set to 3
master:true on 5 nodes,
master:false on the other nodes.
unicast discovery with hosts set to the 5 nodes that can become master.

During the initial cluster everything works fine, a master is elected
and the other nodes join the cluster.

If I remove a non-master node from cluster, then try to join again the
node does not join anymore.
It only gets a ping response from the elected master and does not join
the cluster. I suspect its because minimum_master_nodes is
set to 3 and the unelected masters are somehow not responding to the
ping.

If I restart another node, the first node gets a ping response from
the master (with master=true, [master=Cerberus...] and from the other
node that tries to join the cluster (with master=false, master
[null]). Still not joining any cluster.

Only if I set minimum_master_nodes to 1 the node joins the cluster,
but I thought that I wanted that the node
can only join a cluster with 3 master nodes, to prevent split-brains.
Or does it already prevent split-brains if I
set minimum_master_nodes to 3 for the master:true nodes, and
minimum_master_nodes to 1 for the master:false nodes?
The only thing I suspect is that if I restart one of the 5 master:true
nodes, that one also cannot join the cluster again.

Best,
Michel


(system) #4