Split Brain and ES node in multiple cluster

Anand_Nalya · August 28, 2013, 8:22am

Hi,

I've a 3 node ES cluster and I've set discovery.zen.minimum_master_nodes =
2 and node.max_local_storage_nodes = 1. These nodes are deployed on NFS so
that in case a machine is fails, an ES process can be started on the same
data path from a different machine. Each machine has access to data
directories for all es nodes.

The Initial deployment was:
ES1 - Node1:9300 (path.data=/nfs/node1)
ES2 - Node2:9300 (path.data=/nfs/node2)
ES3 - Node3:9300 (path.data=/nfs/node3)

Then I took the following steps:

Killed the network on Node2, which resulted in a cluster with two nodes
(ES1,ES3). ES2 process was still running on Node2.
Started new es process on Node1 with path.data=/nfs/node2. I was
assuming since node.max_local_storage_nodes = 1, it will not start as ES2
on Node2:9300 already has lock on it, but it started anyways. The cluster
now looked like
ES1 - Node1:9300 (path.data=/nfs/node1)
ES2 - Node1:9301 (path.data=/nfs/node2)
ES3 - Node3:9300 (path.data=/nfs/node3)

There was alos ES2 - Node2:9300 (path.data=/nfs/node2) still running but
not part of the cluster.

I started the network on Node2 which resulted in following two clusters:
Cluster1:
ES1 - Node1:9300 (path.data=/nfs/node1)
ES2 - Node1:9301 (path.data=/nfs/node2)
ES3 - Node3:9300 (path.data=/nfs/node3)

Cluster2:
ES1 - Node1:9300 (path.data=/nfs/node1)
ES2 - Node2:9300 (path.data=/nfs/node2)

Now, Node1:9300 is participating in both the clusters which doesn't seems
right to me.

Is there any way to restrict participation of an ES node to a single
cluster? Also, can I specify a timeout somewhere after which an ES node
will die if minimum no. of master nodes are not reachable?

Anand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · August 28, 2013, 11:28am

Why don't you just use path.data=/nfs ?

You can give your cluster a name in the node config, but I'm sure this is
not what you are looking for.

There is no proven fault tolerance against network connection failures
between nodes in ES, only against node failures.

There is no timeout in minimum_master_nodes because the idea is waiting for
a number of nodes being connected to each other before a leader (master) is
elected. What shall happen after the timeout, except more waiting?

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Anand_Nalya · August 29, 2013, 7:00am

Hi Jörg,

Please find my answers inline.

Thanks,
Anand

On Wednesday, 28 August 2013 16:58:44 UTC+5:30, Jörg Prante wrote:

Why don't you just use path.data=/nfs ?

A. I'm not using /nfs as path.data as I want finer control over which

directory is used on a particular machine..

You can give your cluster a name in the node config, but I'm sure this is
not what you are looking for.

A. The problem is that now I've two clusters with the same name with

one cluster in green state. Worse is that I've one node participating in
two clusters. Is there any way of preventing that?

There is no proven fault tolerance against network connection failures
between nodes in ES, only against node failures.

There is no timeout in minimum_master_nodes because the idea is waiting
for a number of nodes being connected to each other before a leader
(master) is elected. What shall happen after the timeout, except more
waiting?

A. The process can die after not finding any eligible masters after

waiting for a given time interval.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · August 29, 2013, 7:21am

Using /nfs would give you the control you need, there are already
subfolders for each node.

The situation of participating in two clusters is not possible for a node -
there would be strong conflicts in the internal state because per node
there is only one cluster state. It is possible that two masters share the
node in their cluster states. This is called a "split brain", the problem
is not the node, but the two masters.

If the process dies instead of waiting, you would have to monitor and
restart all nodes over and over again in a large cluster while coming up
slowly, which is contrary to the aim of minimum_master_nodes.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Node not join the cluster so what happen about the data? Elasticsearch	4	367	July 6, 2017
Blocking the communication between 2 ElasticSearch severs in 4 nodes cluster leads to split brain Elasticsearch	2	434	July 6, 2017
Elasticsearch split brain - adding a 3rd node with node.data=false Elasticsearch	12	802	August 9, 2018
Elasticsearch: 2-node cluster with failover Elasticsearch	7	5669	July 6, 2017
Split-brain situation - forcing discovery and rejoin Elasticsearch	3	647	July 6, 2017

Split Brain and ES node in multiple cluster

Related topics