Strange split brain scenario

Eran_Kutner_2 · November 6, 2011, 4:18pm

Hi,
We're using version 0.17.6 with two servers and had a strange problem. One
of the server identified itself (es1-01) as the master and the peer
(es1-02) as the slave:
{
"cluster_name" : "gcs",
"master_node" : "j0VcAKNRSsKnbHyaeof6pQ",
"blocks" : {
},
"nodes" : {
"zi6VNqehTCaZjNS0nbGUhg" : {
"name" : "es1-02",
"transport_address" : "inet[/10.1.101.152:9300]",
"attributes" : {
}
},
"j0VcAKNRSsKnbHyaeof6pQ" : {
"name" : "es1-01",
"transport_address" : "inet[/10.1.101.151:9300]",
"attributes" : {
}
}
},

While the other one only saw itself:
{
"cluster_name" : "gcs",
"master_node" : "zi6VNqehTCaZjNS0nbGUhg",
"blocks" : {
},
"nodes" : {
"zi6VNqehTCaZjNS0nbGUhg" : {
"name" : "es1-02",
"transport_address" : "inet[/10.1.101.152:9300]",
"attributes" : {
}
}
},

Only resetting es1-02 caused it to properly identify the other server.

Now es1-01 is the master of all the shards in all the indexes, and 3 days
after the problem it didn't rebalance, is that expected? Is there a way to
force rebalancing?

Thanks,
Eran

kimchy · November 9, 2011, 6:17am

Do you have the logs from those two servers?

On Sun, Nov 6, 2011 at 6:18 PM, Eran Kutner eran@gigya-inc.com wrote:

Hi,
We're using version 0.17.6 with two servers and had a strange problem. One
of the server identified itself (es1-01) as the master and the peer
(es1-02) as the slave:
{
"cluster_name" : "gcs",
"master_node" : "j0VcAKNRSsKnbHyaeof6pQ",
"blocks" : {
},
"nodes" : {
"zi6VNqehTCaZjNS0nbGUhg" : {
"name" : "es1-02",
"transport_address" : "inet[/10.1.101.152:9300]",
"attributes" : {
}
},
"j0VcAKNRSsKnbHyaeof6pQ" : {
"name" : "es1-01",
"transport_address" : "inet[/10.1.101.151:9300]",
"attributes" : {
}
}
},

While the other one only saw itself:
{
"cluster_name" : "gcs",
"master_node" : "zi6VNqehTCaZjNS0nbGUhg",
"blocks" : {
},
"nodes" : {
"zi6VNqehTCaZjNS0nbGUhg" : {
"name" : "es1-02",
"transport_address" : "inet[/10.1.101.152:9300]",
"attributes" : {
}
}
},

Only resetting es1-02 caused it to properly identify the other server.

Now es1-01 is the master of all the shards in all the indexes, and 3 days
after the problem it didn't rebalance, is that expected? Is there a way to
force rebalancing?

Thanks,
Eran

Eran_Kutner_2 · November 14, 2011, 10:34am

The relevant parts of the logs from es1-02 are here:
http://pastebin.com/aZshxLqn
and the parts from es1-01 are here: http://pastebin.com/et1MWMDG

Note that the reset we initiated was on Nov. 2nd at 4:24am. I can't be sure
but we don't recall resetting the service on Nov. 1st at 00:21 when the log
of es1-01 indicates a reset, also, the logs doesn't show a "stopping"
message before those lines. Does ES has some built in watchdog that could
do this?

Let me know if there is any additional information I can provide.

Thanks.

-eran

Topic		Replies	Views
Every node for itself Elasticsearch	2	423	July 6, 2017
Elasticsearch is not rebalancing Elasticsearch	5	729	July 6, 2017
ES failure for few seconds during master re-elect Elasticsearch	4	555	July 6, 2017
Shard Allocation Problem Elasticsearch	3	348	July 6, 2017
Split brain problem in 2 node elasticsearch cluster Elasticsearch	7	1131	July 6, 2017

Strange split brain scenario

Related topics