Split brain problem on multi-node-single-machine installation

uwe_bartels · March 25, 2015, 10:44am

Hiho,

due to an elasticsearch installation on hardware rather than on virtual
machines, I'm running 1 client-node (master:false, data:false) and 2
data-master nodes (master:true, data:true).

the two data-master nodes listen on 0.0.0.0 and without a configured port
where

node1 is automatically listening on port 9300 and
node2 is listening on port 9301.

Now I'm implementing a rolling restart and run into the split-brain
problem, where

node1 stops listening on 9300, node2 elects himself to the master still
listening on port 9301
when starting node1 again, it does not see the cluster anymore, it elects
himself as a master too, it doesn't seem to check the port range 9300-9400

So what would be the right way to configure elasticsearch for not running
into the splitbrain problem.

Any help is appreciated.
Best...
Uwe

this is the corresponding log output with changed hostnames and changed
node names

startup node1

[2015-03-25 11:35:00,889][INFO ][transport ] [node1]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.0.0.123:9300]}
[2015-03-25 11:35:01,214][INFO ][discovery ] [node1]
chef-test/PPrAcCFuRbitjRyL9v0Dnw

startup node2

[2015-03-25 11:35:05,374][INFO ][transport ] [node2]
bound_address {inet[/0:0:0:0:0:0:0:0:9301]}, publish_address
{inet[/10.0.0.123:9301]}
[2015-03-25 11:35:05,709][INFO ][discovery ] [node2]
chef-test/9XW7aeBMSIq0LAr1jumuow
[2015-03-25 11:35:20,203][INFO ][cluster.service ] [node2]
new_master
[node2][9XW7aeBMSIq0LAr1jumuow][host1][inet[/10.0.0.123:9301]]{aws_availability_zone=eu-west-1b,
master=true}, reason: zen-disco-join (elected_as_master)
[2015-03-25 11:35:20,260][INFO ][cluster.service ] [node2] added
{[node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true},}, reason: zen-disco-receive(join from
node[[node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true}])

stop node1, log on node2
[2015-03-25 11:35:52,660][INFO ][cluster.service ] [node2] removed
{[node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true},}, reason:
zen-disco-node_left([node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true})

start node1, log on node1
[2015-03-25 11:38:36,043][INFO ][transport ] [node1]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.0.0.123:9300]}
[2015-03-25 11:38:36,156][INFO ][discovery ] [node1]
chef-test/iqurXSk9RNuf7YWUGHpFKg
[2015-03-25 11:38:43,829][INFO ][cluster.service ] [node1]
new_master
[node1][iqurXSk9RNuf7YWUGHpFKg][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true}, reason: zen-disco-join (elected_as_master)

log on node2 doesn't change of course during startup of node1

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/504348af-0cf7-4192-a4c0-66cc38ca2413%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Phani_Nadiminti · March 25, 2015, 1:24pm

Hi,

 Are the nodes on same network? is cluster name is same on all nodes?If

yes then follow the below procedure.

In order to avoid split brain problem we have to choose N/2+1 condition

to elect minimum master nodes for cluster.It seems you already have 2
master nodes but if one master is down there is no other node in your
cluster to elect as master. To avoid split brain problem we have to add the
following properties to elasticsearch.yml configuration on each node in a
cluster.

 discovery.zen.minimum_master_nodes: 2
 discovery.zen.ping.timeout: 30s  (we can set based on network latency).

      In order to avoid split brain we have to introduce another master

node in your cluster and set the above properties split brain will be
gone.Means if we have 3 master nodes in cluster and set
discovery.zen.minimum_master_nodes: 2 , means at any point of time your
cluster will have 2 masters in order to form a cluster.This will achieve
effective fail over also.

    Hope it helps!!!!

On Wednesday, March 25, 2015 at 4:14:30 PM UTC+5:30, uwe.b...@gmail.com
wrote:

Hiho,

due to an elasticsearch installation on hardware rather than on virtual
machines, I'm running 1 client-node (master:false, data:false) and 2
data-master nodes (master:true, data:true).

the two data-master nodes listen on 0.0.0.0 and without a configured port
where

node1 is automatically listening on port 9300 and

node2 is listening on port 9301.

Now I'm implementing a rolling restart and run into the split-brain
problem, where

node1 stops listening on 9300, node2 elects himself to the master still
listening on port 9301

when starting node1 again, it does not see the cluster anymore, it
elects himself as a master too, it doesn't seem to check the port range
9300-9400

So what would be the right way to configure elasticsearch for not running
into the splitbrain problem.

Any help is appreciated.
Best...
Uwe

this is the corresponding log output with changed hostnames and changed
node names

startup node1

[2015-03-25 11:35:00,889][INFO ][transport ] [node1]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
10.0.0.123:9300]}
[2015-03-25 11:35:01,214][INFO ][discovery ] [node1]
chef-test/PPrAcCFuRbitjRyL9v0Dnw

startup node2

[2015-03-25 11:35:05,374][INFO ][transport ] [node2]
bound_address {inet[/0:0:0:0:0:0:0:0:9301]}, publish_address {inet[/
10.0.0.123:9301]}
[2015-03-25 11:35:05,709][INFO ][discovery ] [node2]
chef-test/9XW7aeBMSIq0LAr1jumuow
[2015-03-25 11:35:20,203][INFO ][cluster.service ] [node2]
new_master
[node2][9XW7aeBMSIq0LAr1jumuow][host1][inet[/10.0.0.123:9301]]{aws_availability_zone=eu-west-1b,
master=true}, reason: zen-disco-join (elected_as_master)
[2015-03-25 11:35:20,260][INFO ][cluster.service ] [node2] added
{[node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true},}, reason: zen-disco-receive(join from
node[[node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true}])

stop node1, log on node2
[2015-03-25 11:35:52,660][INFO ][cluster.service ] [node2]
removed
{[node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true},}, reason:
zen-disco-node_left([node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true})

start node1, log on node1
[2015-03-25 11:38:36,043][INFO ][transport ] [node1]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
10.0.0.123:9300]}
[2015-03-25 11:38:36,156][INFO ][discovery ] [node1]
chef-test/iqurXSk9RNuf7YWUGHpFKg
[2015-03-25 11:38:43,829][INFO ][cluster.service ] [node1]
new_master
[node1][iqurXSk9RNuf7YWUGHpFKg][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true}, reason: zen-disco-join (elected_as_master)

log on node2 doesn't change of course during startup of node1

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e590b681-c1af-4d1b-ba68-10f3ceffa66d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

uwe_bartels · March 25, 2015, 2:18pm

Hello Pani,

thanks for you answer.
Now the node1 doesn't elect itself as the master anymore, but still it
doesn't see the cluster either.

I said that I run one client node. this is node0 in my installation.
Previously this node was neither master nor data node, only a client node
for load balancing.

I configured this node0 now for being also a master node.
According to the log I see now again 3 nodes all being able to play the
master role.

I start the cluster completely fresh and start node1 first, to bind the
port 9300. When the cluster is ready. I stop node1 and restart it. It again
does not find the other 2 nodes running on the same machine on port 9301
and 9302.

It doesn't elect itself as a master anymore, but my cluster is still broken.

The node1 binds successfully port 9300 but doesn't see to scan the port
range 9300-9400 for other existing cluster members.

Best...
Uwe

On Wednesday, March 25, 2015 at 2:24:30 PM UTC+1, phani.n...@goktree.com
wrote:

Hi,
 Are the nodes on same network? is cluster name is same on all 
nodes?If yes then follow the below procedure.
In order to avoid split brain problem we have to choose N/2+1 
condition to elect minimum master nodes for cluster.It seems you already
have 2 master nodes but if one master is down there is no other node in
your cluster to elect as master. To avoid split brain problem we have to
add the following properties to elasticsearch.yml configuration on each
node in a cluster.
 discovery.zen.minimum_master_nodes: 2
 discovery.zen.ping.timeout: 30s  (we can set based on network 
latency).
      In order to avoid split brain we have to introduce another 
master node in your cluster and set the above properties split brain will
be gone.Means if we have 3 master nodes in cluster and set
discovery.zen.minimum_master_nodes: 2 , means at any point of time your
cluster will have 2 masters in order to form a cluster.This will achieve
effective fail over also.
    Hope it helps!!!!
On Wednesday, March 25, 2015 at 4:14:30 PM UTC+5:30, uwe.b...@gmail.com
wrote:

Hiho,

due to an elasticsearch installation on hardware rather than on virtual
machines, I'm running 1 client-node (master:false, data:false) and 2
data-master nodes (master:true, data:true).

the two data-master nodes listen on 0.0.0.0 and without a configured port
where

node1 is automatically listening on port 9300 and

node2 is listening on port 9301.

Now I'm implementing a rolling restart and run into the split-brain
problem, where

node1 stops listening on 9300, node2 elects himself to the master still
listening on port 9301

when starting node1 again, it does not see the cluster anymore, it
elects himself as a master too, it doesn't seem to check the port range
9300-9400

So what would be the right way to configure elasticsearch for not running
into the splitbrain problem.

Any help is appreciated.
Best...
Uwe

this is the corresponding log output with changed hostnames and changed
node names

startup node1

[2015-03-25 11:35:00,889][INFO ][transport ] [node1]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
10.0.0.123:9300]}
[2015-03-25 11:35:01,214][INFO ][discovery ] [node1]
chef-test/PPrAcCFuRbitjRyL9v0Dnw

startup node2

[2015-03-25 11:35:05,374][INFO ][transport ] [node2]
bound_address {inet[/0:0:0:0:0:0:0:0:9301]}, publish_address {inet[/
10.0.0.123:9301]}
[2015-03-25 11:35:05,709][INFO ][discovery ] [node2]
chef-test/9XW7aeBMSIq0LAr1jumuow
[2015-03-25 11:35:20,203][INFO ][cluster.service ] [node2]
new_master
[node2][9XW7aeBMSIq0LAr1jumuow][host1][inet[/10.0.0.123:9301]]{aws_availability_zone=eu-west-1b,
master=true}, reason: zen-disco-join (elected_as_master)
[2015-03-25 11:35:20,260][INFO ][cluster.service ] [node2] added
{[node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true},}, reason: zen-disco-receive(join from
node[[node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true}])

stop node1, log on node2
[2015-03-25 11:35:52,660][INFO ][cluster.service ] [node2]
removed
{[node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true},}, reason:
zen-disco-node_left([node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true})

start node1, log on node1
[2015-03-25 11:38:36,043][INFO ][transport ] [node1]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
10.0.0.123:9300]}
[2015-03-25 11:38:36,156][INFO ][discovery ] [node1]
chef-test/iqurXSk9RNuf7YWUGHpFKg
[2015-03-25 11:38:43,829][INFO ][cluster.service ] [node1]
new_master
[node1][iqurXSk9RNuf7YWUGHpFKg][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true}, reason: zen-disco-join (elected_as_master)

log on node2 doesn't change of course during startup of node1

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cc461dda-69c5-40fd-8848-6d248b97cbe3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Split brain problem in 2 node elasticsearch cluster Elasticsearch	7	1128	July 6, 2017
Split brain due to 'on the fence' network partition Elasticsearch	5	773	July 6, 2017
Blocking the communication between 2 ElasticSearch severs in 4 nodes cluster leads to split brain Elasticsearch	2	435	July 6, 2017
Node not join the cluster so what happen about the data? Elasticsearch	4	367	July 6, 2017
Cluster is broken Elasticsearch	10	675	July 6, 2017

Split brain problem on multi-node-single-machine installation

startup node1

startup node2

log on node2 doesn't change of course during startup of node1

startup node1

startup node2

log on node2 doesn't change of course during startup of node1

startup node1

startup node2

log on node2 doesn't change of course during startup of node1

Related topics