Split brain problem on multi-node-single-machine installation

Hiho,

due to an elasticsearch installation on hardware rather than on virtual
machines, I'm running 1 client-node (master:false, data:false) and 2
data-master nodes (master:true, data:true).

the two data-master nodes listen on 0.0.0.0 and without a configured port
where

  • node1 is automatically listening on port 9300 and
  • node2 is listening on port 9301.

Now I'm implementing a rolling restart and run into the split-brain
problem, where

  • node1 stops listening on 9300, node2 elects himself to the master still
    listening on port 9301
  • when starting node1 again, it does not see the cluster anymore, it elects
    himself as a master too, it doesn't seem to check the port range 9300-9400

So what would be the right way to configure elasticsearch for not running
into the splitbrain problem.

Any help is appreciated.
Best...
Uwe

this is the corresponding log output with changed hostnames and changed
node names

startup node1

[2015-03-25 11:35:00,889][INFO ][transport ] [node1]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.0.0.123:9300]}
[2015-03-25 11:35:01,214][INFO ][discovery ] [node1]
chef-test/PPrAcCFuRbitjRyL9v0Dnw

startup node2

[2015-03-25 11:35:05,374][INFO ][transport ] [node2]
bound_address {inet[/0:0:0:0:0:0:0:0:9301]}, publish_address
{inet[/10.0.0.123:9301]}
[2015-03-25 11:35:05,709][INFO ][discovery ] [node2]
chef-test/9XW7aeBMSIq0LAr1jumuow
[2015-03-25 11:35:20,203][INFO ][cluster.service ] [node2]
new_master
[node2][9XW7aeBMSIq0LAr1jumuow][host1][inet[/10.0.0.123:9301]]{aws_availability_zone=eu-west-1b,
master=true}, reason: zen-disco-join (elected_as_master)
[2015-03-25 11:35:20,260][INFO ][cluster.service ] [node2] added
{[node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true},}, reason: zen-disco-receive(join from
node[[node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true}])

stop node1, log on node2
[2015-03-25 11:35:52,660][INFO ][cluster.service ] [node2] removed
{[node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true},}, reason:
zen-disco-node_left([node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true})

start node1, log on node1
[2015-03-25 11:38:36,043][INFO ][transport ] [node1]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.0.0.123:9300]}
[2015-03-25 11:38:36,156][INFO ][discovery ] [node1]
chef-test/iqurXSk9RNuf7YWUGHpFKg
[2015-03-25 11:38:43,829][INFO ][cluster.service ] [node1]
new_master
[node1][iqurXSk9RNuf7YWUGHpFKg][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true}, reason: zen-disco-join (elected_as_master)

log on node2 doesn't change of course during startup of node1

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/504348af-0cf7-4192-a4c0-66cc38ca2413%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi,

 Are the nodes on same network? is cluster name is same on all nodes?If 

yes then follow the below procedure.

In order to avoid split brain problem we have to choose N/2+1 condition 

to elect minimum master nodes for cluster.It seems you already have 2
master nodes but if one master is down there is no other node in your
cluster to elect as master. To avoid split brain problem we have to add the
following properties to elasticsearch.yml configuration on each node in a
cluster.

 discovery.zen.minimum_master_nodes: 2
 discovery.zen.ping.timeout: 30s  (we can set based on network latency).

      In order to avoid split brain we have to introduce another master 

node in your cluster and set the above properties split brain will be
gone.Means if we have 3 master nodes in cluster and set
discovery.zen.minimum_master_nodes: 2 , means at any point of time your
cluster will have 2 masters in order to form a cluster.This will achieve
effective fail over also.

    Hope it helps!!!!

On Wednesday, March 25, 2015 at 4:14:30 PM UTC+5:30, uwe.b...@gmail.com
wrote:

Hiho,

due to an elasticsearch installation on hardware rather than on virtual
machines, I'm running 1 client-node (master:false, data:false) and 2
data-master nodes (master:true, data:true).

the two data-master nodes listen on 0.0.0.0 and without a configured port
where

  • node1 is automatically listening on port 9300 and
  • node2 is listening on port 9301.

Now I'm implementing a rolling restart and run into the split-brain
problem, where

  • node1 stops listening on 9300, node2 elects himself to the master still
    listening on port 9301
  • when starting node1 again, it does not see the cluster anymore, it
    elects himself as a master too, it doesn't seem to check the port range
    9300-9400

So what would be the right way to configure elasticsearch for not running
into the splitbrain problem.

Any help is appreciated.
Best...
Uwe

this is the corresponding log output with changed hostnames and changed
node names

startup node1

[2015-03-25 11:35:00,889][INFO ][transport ] [node1]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
10.0.0.123:9300]}
[2015-03-25 11:35:01,214][INFO ][discovery ] [node1]
chef-test/PPrAcCFuRbitjRyL9v0Dnw

startup node2

[2015-03-25 11:35:05,374][INFO ][transport ] [node2]
bound_address {inet[/0:0:0:0:0:0:0:0:9301]}, publish_address {inet[/
10.0.0.123:9301]}
[2015-03-25 11:35:05,709][INFO ][discovery ] [node2]
chef-test/9XW7aeBMSIq0LAr1jumuow
[2015-03-25 11:35:20,203][INFO ][cluster.service ] [node2]
new_master
[node2][9XW7aeBMSIq0LAr1jumuow][host1][inet[/10.0.0.123:9301]]{aws_availability_zone=eu-west-1b,
master=true}, reason: zen-disco-join (elected_as_master)
[2015-03-25 11:35:20,260][INFO ][cluster.service ] [node2] added
{[node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true},}, reason: zen-disco-receive(join from
node[[node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true}])

stop node1, log on node2
[2015-03-25 11:35:52,660][INFO ][cluster.service ] [node2]
removed
{[node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true},}, reason:
zen-disco-node_left([node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true})

start node1, log on node1
[2015-03-25 11:38:36,043][INFO ][transport ] [node1]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
10.0.0.123:9300]}
[2015-03-25 11:38:36,156][INFO ][discovery ] [node1]
chef-test/iqurXSk9RNuf7YWUGHpFKg
[2015-03-25 11:38:43,829][INFO ][cluster.service ] [node1]
new_master
[node1][iqurXSk9RNuf7YWUGHpFKg][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true}, reason: zen-disco-join (elected_as_master)

log on node2 doesn't change of course during startup of node1

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e590b681-c1af-4d1b-ba68-10f3ceffa66d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hello Pani,

thanks for you answer.
Now the node1 doesn't elect itself as the master anymore, but still it
doesn't see the cluster either.

I said that I run one client node. this is node0 in my installation.
Previously this node was neither master nor data node, only a client node
for load balancing.

I configured this node0 now for being also a master node.
According to the log I see now again 3 nodes all being able to play the
master role.

I start the cluster completely fresh and start node1 first, to bind the
port 9300. When the cluster is ready. I stop node1 and restart it. It again
does not find the other 2 nodes running on the same machine on port 9301
and 9302.

It doesn't elect itself as a master anymore, but my cluster is still broken.

The node1 binds successfully port 9300 but doesn't see to scan the port
range 9300-9400 for other existing cluster members.

Best...
Uwe

On Wednesday, March 25, 2015 at 2:24:30 PM UTC+1, phani.n...@goktree.com
wrote:

Hi,

 Are the nodes on same network? is cluster name is same on all 

nodes?If yes then follow the below procedure.

In order to avoid split brain problem we have to choose N/2+1 

condition to elect minimum master nodes for cluster.It seems you already
have 2 master nodes but if one master is down there is no other node in
your cluster to elect as master. To avoid split brain problem we have to
add the following properties to elasticsearch.yml configuration on each
node in a cluster.

 discovery.zen.minimum_master_nodes: 2
 discovery.zen.ping.timeout: 30s  (we can set based on network 

latency).

      In order to avoid split brain we have to introduce another 

master node in your cluster and set the above properties split brain will
be gone.Means if we have 3 master nodes in cluster and set
discovery.zen.minimum_master_nodes: 2 , means at any point of time your
cluster will have 2 masters in order to form a cluster.This will achieve
effective fail over also.

    Hope it helps!!!!

On Wednesday, March 25, 2015 at 4:14:30 PM UTC+5:30, uwe.b...@gmail.com
wrote:

Hiho,

due to an elasticsearch installation on hardware rather than on virtual
machines, I'm running 1 client-node (master:false, data:false) and 2
data-master nodes (master:true, data:true).

the two data-master nodes listen on 0.0.0.0 and without a configured port
where

  • node1 is automatically listening on port 9300 and
  • node2 is listening on port 9301.

Now I'm implementing a rolling restart and run into the split-brain
problem, where

  • node1 stops listening on 9300, node2 elects himself to the master still
    listening on port 9301
  • when starting node1 again, it does not see the cluster anymore, it
    elects himself as a master too, it doesn't seem to check the port range
    9300-9400

So what would be the right way to configure elasticsearch for not running
into the splitbrain problem.

Any help is appreciated.
Best...
Uwe

this is the corresponding log output with changed hostnames and changed
node names

startup node1

[2015-03-25 11:35:00,889][INFO ][transport ] [node1]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
10.0.0.123:9300]}
[2015-03-25 11:35:01,214][INFO ][discovery ] [node1]
chef-test/PPrAcCFuRbitjRyL9v0Dnw

startup node2

[2015-03-25 11:35:05,374][INFO ][transport ] [node2]
bound_address {inet[/0:0:0:0:0:0:0:0:9301]}, publish_address {inet[/
10.0.0.123:9301]}
[2015-03-25 11:35:05,709][INFO ][discovery ] [node2]
chef-test/9XW7aeBMSIq0LAr1jumuow
[2015-03-25 11:35:20,203][INFO ][cluster.service ] [node2]
new_master
[node2][9XW7aeBMSIq0LAr1jumuow][host1][inet[/10.0.0.123:9301]]{aws_availability_zone=eu-west-1b,
master=true}, reason: zen-disco-join (elected_as_master)
[2015-03-25 11:35:20,260][INFO ][cluster.service ] [node2] added
{[node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true},}, reason: zen-disco-receive(join from
node[[node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true}])

stop node1, log on node2
[2015-03-25 11:35:52,660][INFO ][cluster.service ] [node2]
removed
{[node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true},}, reason:
zen-disco-node_left([node1][PPrAcCFuRbitjRyL9v0Dnw][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true})

start node1, log on node1
[2015-03-25 11:38:36,043][INFO ][transport ] [node1]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
10.0.0.123:9300]}
[2015-03-25 11:38:36,156][INFO ][discovery ] [node1]
chef-test/iqurXSk9RNuf7YWUGHpFKg
[2015-03-25 11:38:43,829][INFO ][cluster.service ] [node1]
new_master
[node1][iqurXSk9RNuf7YWUGHpFKg][host1][inet[/10.0.0.123:9300]]{aws_availability_zone=eu-west-1b,
master=true}, reason: zen-disco-join (elected_as_master)

log on node2 doesn't change of course during startup of node1

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cc461dda-69c5-40fd-8848-6d248b97cbe3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.