Cluster not reforming after simulating split-brain using iptables

Hi,
I wanted to understand the behavior of split-brain with ElasticSearch.

So I created a two-node cluster with a single index. Once the cluster was established, I installed iptables rule like so:

node1 rule:
iptables -I OUTPUT 1 -d node2 ip --jump DROP

node2 rule:
iptables -I OUTPUT 1 -d node1 ip --jump DROP

As expected in some time I could see that node1 was a one-node cluster and node2 was an independent one-node cluster.

However, at this point when I delete the iptables entries, the cluster does NOT reform. I am using this with multicast.disabled on my on-prem servers.

My unicast.hosts file has on node1:
unicast.hosts: node2's IP

My unicast.hosts file has on node2:
unicast.hosts: node1's IP

So how can I get the cluster to re-form at this point? Is the only option a restart of elasticsearch service on one of the nodes??

Appreciate any responses. Thanks.

Is there some reason why you can't use a 3rd master only node? The whole point is to not get into a split brain situation because you'll probably end up losing data once the cluster re-joins.

As to why the cluster hasn't reformed, you'll need to provide the log files about any errors. Telnet to port 9300 to each node from the other and check that this port is open.

Did you set minimum_master_nodes to 2?

Umm no, minimum_master is default value (should be 1).

If minimum_master_nodes was set to 2, then there would be no cluster and
not two individual one node clusters.

It appears that the nodes are not broadcasting discovery events since it
always has been online. It waits for network chatter from other nodes
joining the cluster.

Perhaps the problem is related to the fact that unicast discovery does not
attempt to resolve IPs after startup?

Uhm, isn't that by design? As soon as your cluster falls apart into separate clusters because minimum_master_nodes is too low, it will never become one single cluster again. How could it, there might be conflicts. At least that is the behavior I observed when I was experimenting with it, which was version 1.7 or less, though.

Except if it cannot fulfil min masters both nodes will refuse any connection, I'm not sure how you can get drift in that case.

Unfortunately we have to make this work in the event that only one node is operational as well. So I can't set min_master = 2. I'll have to make-do.

Precisely. Node1 thinks node2 has died (because Node1's iptables dropped all packets going to Node2) and vice-versa.

This is just simulating the case where one of the nodes became unreachable (say due to routing issues or such). I'd tend to think there HAS to be a retry -- perhaps with exponential backoff?

Does anyone know or have a link to relevant parts of the code base that I can explore? It seems wrong that there is no retry in my case.

If you have two nodes with min masters set as one and you simulate a network partition, they will each elect themselves as masters, hence creating two clusters with the same name.
Given they both see themselves as masters they will never try to reconnect to any other nodes.

When you run with only 2 nodes, minimum_master_nodes need to be set to 2 in order to avoid separate clusters forming in the event of a network partition. With this setting, the cluster will be masterless until both nodes are available and will serve reads but reject writes. The reason for this is to avoid the risk of data loss.

As with most systems that rely on consensus based master election, 3 nodes is the magic number. If you introduce a small, third node that a dedicated master node, this will act as a tiebreaker and allow one of the sides of the partition to elect a master in the case of a network partition. This side of the partition would continue to take writes, while the single partitioned node would only serve reads.

This is e.g. what Found does. If you provision a cluster in two availability zones, it automatically provides a free third dedicated master node that acts as a tiebreaker as this helps improve availability and stability.

1 Like

What version of Elasticsearch?

Sorry should have mentioned it upfront, 2.1.1:

[admin@CentOS7 src]$ rpm -qa | grep elasticsearch
elasticsearch-2.1.1-1.noarch

[admin@CentOS7 src]$ uname -a
Linux CentOS7 3.4.10.0.0xxxxxx-0 #1 SMP Fri Nov 20 19:05:44 PST 2015 x86_64 x86_64 x86_64 GNU/Linux

You are correct. My split brain issues with Elasticsearch always had one or
more nodes in two different clusters, never truly disjoint clusters.