Nodes fail to join cluster - potential split brain scenario

Ivan · February 25, 2013, 7:08pm

8 node cluster running 0.20.0RC1, MINIMUM_MASTER_NODES is set to 5.

At a certain point, 2 nodes (search7 and search8) left the cluster. The
reason is unknown, it occurred while increasing the replica count on a new
index, but I am not focused on that right now. Stopped the process on both
search7 and search8 and started them up one at a time.

Upon restarting search7, it seemed to think that search8 was the master.
Since the process was down, it did not join the cluster.

[2013-02-25 09:09:05,108][WARN ][discovery.zen ] [search7]
failed to connect to master
[[search8][GYhoDKLWRFOCy7KtUgVVQg][inet[/ipaddress:9300]]],
retrying...
org.elasticsearch.transport.ConnectTransportException:
[search8][inet[/ipaddress:9300]] connect_timeout[5s]
Next I started search8 and attempted to restart search7. Ignoring
search8's logs for now, search7 now cannot join the cluster for other
reasons (not master):

[2013-02-25 09:10:47,816][INFO ][discovery.zen ] [search7]
failed to send join request to master
[[search8][GYhoDKLWRFOCy7KtUgVVQg][inet[/ipaddress:9300]]], reason
[org.elasticsearch.transport.RemoteTransportException:
[search8][inet[/ipaddress:9300]][discovery/zen/join];
org.elasticsearch.ElasticSearchIllegalStateException: Node
[[search8][feMQtDFOTs2xyh0xcTXdkA][inet[/ipaddress:9300]]] not master
for join request from
[[search7][4lEhStfwQDKuHpGvHeU-hQ][inet[/ipaddress:9300]]]]
[2013-02-25 09:10:47,816][TRACE][discovery.zen ] [search7]
detailed failed reason
org.elasticsearch.transport.RemoteTransportException:
[search8][inet[/ipaddress:9300]][discovery/zen/join]
Caused by: org.elasticsearch.ElasticSearchIllegalStateException: Node
[[search8][feMQtDFOTs2xyh0xcTXdkA][inet[/ipaddress:9300]]] not master
for join request from
[[search7][4lEhStfwQDKuHpGvHeU-hQ][inet[/ipaddress:9300]]]

More logging output is at

gist.github.com

https://gist.github.com/brusic/1ff1be9444078d7d0077

gistfile1.txt

[2013-02-25 09:09:04,104][TRACE][discovery.zen.ping.multicast] [search7] [104] sending ping request
...
[2013-02-25 09:09:05,104][TRACE][discovery.zen            ] [search7] full ping responses:
        --> target [[search12][gxTetOKXQTaJ82k_3w4pMQ][inet[/ipaddress:9300]]], master [[search8][GYhoDKLWRFOCy7KtUgVVQg][inet[/ipaddress:9300]]]
        --> target [[search9][qPpWpsS0SoG_nQdezJsECA][inet[/ipaddress:9300]]], master [[search8][GYhoDKLWRFOCy7KtUgVVQg][inet[/ipaddress:9300]]]
        --> target [[search6][z70EGYtFTXGttDDQdtqrbQ][inet[/ipaddress:9300]]], master [[search8][GYhoDKLWRFOCy7KtUgVVQg][inet[/ipaddress:9300]]]
        --> target [[search10][WTkt_bFKQmmc6dW3q7-WDQ][inet[/ipaddress:9300]]], master [[search8][GYhoDKLWRFOCy7KtUgVVQg][inet[/ipaddress:9300]]]
        --> target [[search5][f_48G1mtScmy1rQHz-YaaQ][inet[/ipaddress:9300]]], master [[search8][GYhoDKLWRFOCy7KtUgVVQg][inet[/ipaddress:9300]]]
        --> target [[search11][QPvZFBO1RTWJ2eEru_rKEw][inet[/ipaddress:9300]]], master [[search11][QPvZFBO1RTWJ2eEru_rKEw][inet[/ipaddress:9300]]]
[2013-02-25 09:09:05,105][DEBUG][discovery.zen            ] [search7] filtered ping responses: (filter_client[true], filter_data[false])

This file has been truncated. show original

gistfile2.txt

[2013-02-25 09:10:47,308][TRACE][discovery.zen.ping.multicast] [search7] [206] sending ping request
...
[2013-02-25 09:10:47,809][TRACE][discovery.zen            ] [search7] full ping responses:
        --> target [[search12][gxTetOKXQTaJ82k_3w4pMQ][inet[/ipaddress:9300]]], master [[search8][GYhoDKLWRFOCy7KtUgVVQg][inet[/ipaddress:9300]]]
        --> target [[search9][qPpWpsS0SoG_nQdezJsECA][inet[/ipaddress:9300]]], master [[search8][GYhoDKLWRFOCy7KtUgVVQg][inet[/ipaddress:9300]]]
        --> target [[search6][z70EGYtFTXGttDDQdtqrbQ][inet[/ipaddress:9300]]], master [[search8][GYhoDKLWRFOCy7KtUgVVQg][inet[/ipaddress:9300]]]
        --> target [[search10][WTkt_bFKQmmc6dW3q7-WDQ][inet[/ipaddress:9300]]], master [[search8][GYhoDKLWRFOCy7KtUgVVQg][inet[/ipaddress:9300]]]
        --> target [[search5][f_48G1mtScmy1rQHz-YaaQ][inet[/ipaddress:9300]]], master [[search8][GYhoDKLWRFOCy7KtUgVVQg][inet[/ipaddress:9300]]]
        --> target [[search11][QPvZFBO1RTWJ2eEru_rKEw][inet[/ipaddress:9300]]], master [[search11][QPvZFBO1RTWJ2eEru_rKEw][inet[/ipaddress:9300]]]
        --> target [[search8][feMQtDFOTs2xyh0xcTXdkA][inet[/ipaddress:9300]]], master [null]

This file has been truncated. show original

Focusing only on search7 for now. It is sending ping requests to all nodes
in the network cluster, and they all seem to respond that search8 is the
master. The other 6 nodes are forming an ES cluster, without search8 as the
master. Why are they returning search8 as the master?

If this is a split brain scenario, why didn't setting the minimum master
nodes help? How can someone recover from this scenario? We deleted the new
index, and the cluster returned to a green state. I assume that deleting
the data directories on search7 and search8 would have made the cluster go
into a yellow state. What does it take for the master election process to
start?

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan · February 26, 2013, 10:59pm

More split-brain weirdness on another cluster. Four node cluster, no
replicas on the indices. Cluster is in a red-state because one node dropped
out and there are no replicas, so a couple of shards were missing. Using
the cluster API, servers 1,2,3 think they form a cluster. Server 4 thinks
the cluster is formed between 1,2,4. Restarting server 4 returned
everything back to a green state.

Help on red-state clusters would be appreciated.

--
Ivan

On Mon, Feb 25, 2013 at 11:08 AM, Ivan Brusic ivan@brusic.com wrote:

8 node cluster running 0.20.0RC1, MINIMUM_MASTER_NODES is set to 5.

At a certain point, 2 nodes (search7 and search8) left the cluster. The
reason is unknown, it occurred while increasing the replica count on a new
index, but I am not focused on that right now. Stopped the process on both
search7 and search8 and started them up one at a time.

Upon restarting search7, it seemed to think that search8 was the master.
Since the process was down, it did not join the cluster.

[2013-02-25 09:09:05,108][WARN ][discovery.zen ] [search7] failed to connect to master [[search8][GYhoDKLWRFOCy7KtUgVVQg][inet[/ipaddress:9300]]], retrying...
org.elasticsearch.transport.ConnectTransportException: [search8][inet[/ipaddress:9300]] connect_timeout[5s]
Next I started search8 and attempted to restart search7. Ignoring search8's logs for now, search7 now cannot join the cluster for other reasons (not master):

[2013-02-25 09:10:47,816][INFO ][discovery.zen ] [search7] failed to send join request to master [[search8][GYhoDKLWRFOCy7KtUgVVQg][inet[/ipaddress:9300]]], reason [org.elasticsearch.transport.RemoteTransportException: [search8][inet[/ipaddress:9300]][discovery/zen/join]; org.elasticsearch.ElasticSearchIllegalStateException: Node [[search8][feMQtDFOTs2xyh0xcTXdkA][inet[/ipaddress:9300]]] not master for join request from [[search7][4lEhStfwQDKuHpGvHeU-hQ][inet[/ipaddress:9300]]]]
[2013-02-25 09:10:47,816][TRACE][discovery.zen ] [search7] detailed failed reason
org.elasticsearch.transport.RemoteTransportException: [search8][inet[/ipaddress:9300]][discovery/zen/join]
Caused by: org.elasticsearch.ElasticSearchIllegalStateException: Node [[search8][feMQtDFOTs2xyh0xcTXdkA][inet[/ipaddress:9300]]] not master for join request from [[search7][4lEhStfwQDKuHpGvHeU-hQ][inet[/ipaddress:9300]]]

More logging output is at
gist:1ff1be9444078d7d0077 · GitHub

Focusing only on search7 for now. It is sending ping requests to all nodes
in the network cluster, and they all seem to respond that search8 is the
master. The other 6 nodes are forming an ES cluster, without search8 as the
master. Why are they returning search8 as the master?

If this is a split brain scenario, why didn't setting the minimum master
nodes help? How can someone recover from this scenario? We deleted the
new index, and the cluster returned to a green state. I assume that
deleting the data directories on search7 and search8 would have made the
cluster go into a yellow state. What does it take for the master election
process to start?

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · February 27, 2013, 10:19am

On Tue, 2013-02-26 at 14:59 -0800, Ivan Brusic wrote:

More split-brain weirdness on another cluster. Four node cluster, no
replicas on the indices. Cluster is in a red-state because one node
dropped out and there are no replicas, so a couple of shards were
missing. Using the cluster API, servers 1,2,3 think they form a
cluster. Server 4 thinks the cluster is formed between 1,2,4.
Restarting server 4 returned everything back to a green state.

How long are you waiting before asking server 4 for the cluster state?
It doesn't fail immediately, in case there is just a temporary network
outage which it can recover from, but after a while it should recognise
that it is no longer part of the cluster.

Try running all of the logs through es2unix using the "lifecycle" option

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan · February 27, 2013, 3:38pm

Waited an hour before asking for its state. In both cases, Elasticsearch
fails to recognize the correct cluster. Nodes removing themselves is
ultimately the bigger concern, but harder to debug.

On Wed, Feb 27, 2013 at 2:19 AM, Clinton Gormley clint@traveljury.comwrote:

On Tue, 2013-02-26 at 14:59 -0800, Ivan Brusic wrote:

More split-brain weirdness on another cluster. Four node cluster, no
replicas on the indices. Cluster is in a red-state because one node
dropped out and there are no replicas, so a couple of shards were
missing. Using the cluster API, servers 1,2,3 think they form a
cluster. Server 4 thinks the cluster is formed between 1,2,4.
Restarting server 4 returned everything back to a green state.

How long are you waiting before asking server 4 for the cluster state?
It doesn't fail immediately, in case there is just a temporary network
outage which it can recover from, but after a while it should recognise
that it is no longer part of the cluster.

Try running all of the logs through es2unix using the "lifecycle" option
GitHub - elastic/es2unix: Command-line ES

clint

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · February 27, 2013, 9:14pm

Hi Ivan

On Wed, 2013-02-27 at 07:38 -0800, Ivan Brusic wrote:

Waited an hour before asking for its state. In both cases,
Elasticsearch fails to recognize the correct cluster. Nodes removing
themselves is ultimately the bigger concern, but harder to debug.

We really need more information here. What version of ES are you using?
Did you manage to run the logs through 'lifecycle' on es2unix?

clint

On Wed, Feb 27, 2013 at 2:19 AM, Clinton Gormley
clint@traveljury.com wrote:
On Tue, 2013-02-26 at 14:59 -0800, Ivan Brusic wrote:
> More split-brain weirdness on another cluster. Four node
cluster, no
> replicas on the indices. Cluster is in a red-state because
one node
> dropped out and there are no replicas, so a couple of shards
were
> missing. Using the cluster API, servers 1,2,3 think they
form a
> cluster. Server 4 thinks the cluster is formed between
1,2,4.
> Restarting server 4 returned everything back to a green
state.
    How long are you waiting before asking server 4 for the
    cluster state?
    It doesn't fail immediately, in case there is just a temporary
    network
    outage which it can recover from, but after a while it should
    recognise
    that it is no longer part of the cluster.
    
    Try running all of the logs through es2unix using the
    "lifecycle" option
    https://github.com/elasticsearch/es2unix
    
    clint
    
    
    
    --
    You received this message because you are subscribed to the
    Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from
    it, send an email to elasticsearch
    +unsubscribe@googlegroups.com.
    For more options, visit
    https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan · February 28, 2013, 7:47am

The version is the same as the first post: 0.20RC1. I have not seen any
relevant commits since then. I am far more interested in the failure of the
first post and not the second.

Will use the es2unix tomorrow. In the meantime, here are some logs about
the second failure. In the last cluster state, you can see that the master
is wrong.

Cheers,

Ivan

On Wed, Feb 27, 2013 at 1:14 PM, Clinton Gormley clint@traveljury.comwrote:

Hi Ivan

On Wed, 2013-02-27 at 07:38 -0800, Ivan Brusic wrote:

Waited an hour before asking for its state. In both cases,
Elasticsearch fails to recognize the correct cluster. Nodes removing
themselves is ultimately the bigger concern, but harder to debug.

We really need more information here. What version of ES are you using?
Did you manage to run the logs through 'lifecycle' on es2unix?

clint
On Wed, Feb 27, 2013 at 2:19 AM, Clinton Gormley
clint@traveljury.com wrote:
On Tue, 2013-02-26 at 14:59 -0800, Ivan Brusic wrote:
> More split-brain weirdness on another cluster. Four node
cluster, no
> replicas on the indices. Cluster is in a red-state because
one node
> dropped out and there are no replicas, so a couple of shards
were
> missing. Using the cluster API, servers 1,2,3 think they
form a
> cluster. Server 4 thinks the cluster is formed between
1,2,4.
> Restarting server 4 returned everything back to a green
state.
    How long are you waiting before asking server 4 for the
    cluster state?
    It doesn't fail immediately, in case there is just a temporary
    network
    outage which it can recover from, but after a while it should
    recognise
    that it is no longer part of the cluster.

    Try running all of the logs through es2unix using the
    "lifecycle" option
    https://github.com/elasticsearch/es2unix

    clint



    --
    You received this message because you are subscribed to the
    Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from
    it, send an email to elasticsearch
    +unsubscribe@googlegroups.com.
    For more options, visit
    https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan · February 28, 2013, 6:32pm

Here is the output from the second cluster. Note that the numbering is
slightly different (+2)

search3 ~]$ es lifecycle /var/elasticsearch/logs/ESCluster.log.2013-02-26
2013-02-26 13:32:34,869 search3 REMOVE search5
2013-02-26 14:52:57,046 search3 ADD search6

search4 ~]$ es lifecycle /var/elasticsearch/logs/ESCluster.log.2013-02-26
2013-02-26 13:32:34,878 search4 REMOVE search5
2013-02-26 14:52:57,043 search4 ADD search6

search5 ~]$ es lifecycle /var/elasticsearch/logs/ESCluster.log.2013-02-26
2013-02-26 14:52:57,041 search5 ADD search6

search6 ~]$ es lifecycle /var/elasticsearch/logs/ESCluster.log.2013-02-26
2013-02-26 13:32:34,889 search6 REMOVE search5
2013-02-26 14:52:09,542 search6 STOP
2013-02-26 14:52:48,637 search6 INIT 0.20.0.RC1
2013-02-26 14:52:53,754 search6 BIND :9300
2013-02-26 14:52:57,083 search6 MASTER search5
2013-02-26 14:52:57,199 search6 START

On Wed, Feb 27, 2013 at 1:14 PM, Clinton Gormley clint@traveljury.comwrote:

Hi Ivan

On Wed, 2013-02-27 at 07:38 -0800, Ivan Brusic wrote:

Waited an hour before asking for its state. In both cases,
Elasticsearch fails to recognize the correct cluster. Nodes removing
themselves is ultimately the bigger concern, but harder to debug.

We really need more information here. What version of ES are you using?
Did you manage to run the logs through 'lifecycle' on es2unix?

clint
On Wed, Feb 27, 2013 at 2:19 AM, Clinton Gormley
clint@traveljury.com wrote:
On Tue, 2013-02-26 at 14:59 -0800, Ivan Brusic wrote:
> More split-brain weirdness on another cluster. Four node
cluster, no
> replicas on the indices. Cluster is in a red-state because
one node
> dropped out and there are no replicas, so a couple of shards
were
> missing. Using the cluster API, servers 1,2,3 think they
form a
> cluster. Server 4 thinks the cluster is formed between
1,2,4.
> Restarting server 4 returned everything back to a green
state.
    How long are you waiting before asking server 4 for the
    cluster state?
    It doesn't fail immediately, in case there is just a temporary
    network
    outage which it can recover from, but after a while it should
    recognise
    that it is no longer part of the cluster.

    Try running all of the logs through es2unix using the
    "lifecycle" option
    https://github.com/elasticsearch/es2unix

    clint



    --
    You received this message because you are subscribed to the
    Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from
    it, send an email to elasticsearch
    +unsubscribe@googlegroups.com.
    For more options, visit
    https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan · February 28, 2013, 7:00pm

Output for the original post, first cluster:

gist.github.com

https://gist.github.com/brusic/a80e414efec0bee2e3c7

gistfile1.txt

search5 ~]$ es lifecycle /var/elasticsearch/logs/ESCluster.log.2013-02-2[4|5]
2013-02-24 22:46:38,506 search5 REMOVE search11
2013-02-25 09:28:33,562 search5 ADD    search7
2013-02-25 09:29:30,720 search5 ADD    search8
2013-02-25 09:30:40,060 search5 REMOVE search7
2013-02-25 09:30:49,156 search5 ADD    search7

search6 ~]$ es lifecycle /var/elasticsearch/logs/ESCluster.log.2013-02-2[4|5]
2013-02-24 22:46:38,507 search6 REMOVE search11
2013-02-25 09:28:33,561 search6 ADD    search7

This file has been truncated. show original

On Thu, Feb 28, 2013 at 10:32 AM, Ivan Brusic ivan@brusic.com wrote:

Here is the output from the second cluster. Note that the numbering is
slightly different (+2)

search3 ~]$ es lifecycle /var/elasticsearch/logs/ESCluster.log.2013-02-26
2013-02-26 13:32:34,869 search3 REMOVE search5
2013-02-26 14:52:57,046 search3 ADD search6

search4 ~]$ es lifecycle /var/elasticsearch/logs/ESCluster.log.2013-02-26
2013-02-26 13:32:34,878 search4 REMOVE search5
2013-02-26 14:52:57,043 search4 ADD search6

search5 ~]$ es lifecycle /var/elasticsearch/logs/ESCluster.log.2013-02-26
2013-02-26 14:52:57,041 search5 ADD search6

search6 ~]$ es lifecycle /var/elasticsearch/logs/ESCluster.log.2013-02-26
2013-02-26 13:32:34,889 search6 REMOVE search5
2013-02-26 14:52:09,542 search6 STOP
2013-02-26 14:52:48,637 search6 INIT 0.20.0.RC1
2013-02-26 14:52:53,754 search6 BIND :9300
2013-02-26 14:52:57,083 search6 MASTER search5
2013-02-26 14:52:57,199 search6 START

On Wed, Feb 27, 2013 at 1:14 PM, Clinton Gormley clint@traveljury.comwrote:
Hi Ivan

On Wed, 2013-02-27 at 07:38 -0800, Ivan Brusic wrote:

Waited an hour before asking for its state. In both cases,
Elasticsearch fails to recognize the correct cluster. Nodes removing
themselves is ultimately the bigger concern, but harder to debug.

We really need more information here. What version of ES are you using?
Did you manage to run the logs through 'lifecycle' on es2unix?

clint
On Wed, Feb 27, 2013 at 2:19 AM, Clinton Gormley
clint@traveljury.com wrote:
On Tue, 2013-02-26 at 14:59 -0800, Ivan Brusic wrote:
> More split-brain weirdness on another cluster. Four node
cluster, no
> replicas on the indices. Cluster is in a red-state because
one node
> dropped out and there are no replicas, so a couple of shards
were
> missing. Using the cluster API, servers 1,2,3 think they
form a
> cluster. Server 4 thinks the cluster is formed between
1,2,4.
> Restarting server 4 returned everything back to a green
state.
    How long are you waiting before asking server 4 for the
    cluster state?
    It doesn't fail immediately, in case there is just a temporary
    network
    outage which it can recover from, but after a while it should
    recognise
    that it is no longer part of the cluster.

    Try running all of the logs through es2unix using the
    "lifecycle" option
    https://github.com/elasticsearch/es2unix

    clint



    --
    You received this message because you are subscribed to the
    Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from
    it, send an email to elasticsearch
    +unsubscribe@googlegroups.com.
    For more options, visit
    https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

drewr · March 1, 2013, 1:01pm

Ivan Brusic wrote:

Here is the output from the second cluster. Note that the numbering is
slightly different (+2)

Just FYI, the purpose of the lifecycle command is to interleave logs
to replay the timeline of nodes coming and going. If you supply the
logfiles as arguments to the single command, you should see something
like:

% es lifecycle
<(ssh search3 cat /var/elasticsearch/logs/ESCluster.log.2013-02-26)
<(ssh search4 cat /var/elasticsearch/logs/ESCluster.log.2013-02-26)
<(ssh search5 cat /var/elasticsearch/logs/ESCluster.log.2013-02-26)
<(ssh search6 cat /var/elasticsearch/logs/ESCluster.log.2013-02-26)
2013-02-26 13:32:34,869 search3 REMOVE search5
2013-02-26 13:32:34,878 search4 REMOVE search5
2013-02-26 13:32:34,889 search6 REMOVE search5
2013-02-26 14:52:09,542 search6 STOP
2013-02-26 14:52:48,637 search6 INIT 0.20.0.RC1
2013-02-26 14:52:53,754 search6 BIND :9300
2013-02-26 14:52:57,041 search5 ADD search6
2013-02-26 14:52:57,043 search4 ADD search6
2013-02-26 14:52:57,046 search3 ADD search6
2013-02-26 14:52:57,083 search6 MASTER search5
2013-02-26 14:52:57,199 search6 START

Preferably you'd do it over a few days worth of logs to get a real
picture of when things occurred. Perhaps not in this case, but for a
long and complex history this is much easier to parse.

-Drew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · March 1, 2013, 1:17pm

Hiya

On Wed, 2013-02-27 at 23:47 -0800, Ivan Brusic wrote:

The version is the same as the first post: 0.20RC1.

Ah, missed that.

I have not seen any relevant commits since then. I am far more
interested in the failure of the first post and not the second.

What about this one?

I suggest upgrading to 0.20.5

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan · March 1, 2013, 5:32pm

On Fri, Mar 1, 2013 at 5:17 AM, Clinton Gormley clint@traveljury.comwrote:

Hey,

Thanks Drew. Didn't have a chance to explore the tool yet. What I would
love to see is master election.

I have not seen any relevant commits since then. I am far more

interested in the failure of the first post and not the second.

What about this one?
Primary shard failure with initializing replica shards can cause the replica shard to cause allocation failures · Issue #2592 · elastic/elasticsearch · GitHub

That issue does not address the split brain scenario or why a node
disconnected in the first place. These two issues are relevant:

github.com/elastic/elasticsearch

minimum_master_nodes does not prevent split-brain if splits are intersecting

opened 08:15AM - 17 Dec 12 UTC

closed 02:58PM - 01 Sep 14 UTC

saj

>bug v2.0.0-beta1 v1.4.0.Beta1

G'day, I'm using ElasticSearch 0.19.11 with the unicast Zen discovery protocol.… With this setup, I can easily split a 3-node cluster into two 'hemispheres' (continuing with the brain metaphor) with one node acting as a participant in both hemispheres. I believe this to be a significant problem, because now `minimum_master_nodes` is incapable of preventing certain split-brain scenarios. Here's what my 3-node test cluster looked like before I broke it: ![](https://saj.beta.anchortrove.com/es-splitbrain-1.png) Here's what the cluster looked like after simulating a communications failure between nodes (2) and (3): ![](https://saj.beta.anchortrove.com/es-splitbrain-2.png) Here's what seems to have happened immediately after the split: 1. Node (2) and (3) lose contact with one another. (`zen-disco-node_failed` ... `reason failed to ping`) 2. Node (2), still master of the left hemisphere, notes the disappearance of node (3) and broadcasts an advisory message to all of its followers. Node (1) takes note of the advisory. 3. Node (3) has now lost contact with its old master and decides to hold an election. It declares itself winner of the election. On declaring itself, it assumes master role of the right hemisphere, then broadcasts an advisory message to all of its followers. Node (1) takes note of this advisory, too. At this point, I can't say I know what to expect to find on node (1). If I query both masters for a list of nodes, I see node (1) in both clusters. Let's look at `minimum_master_nodes` as it applies to this test cluster. Assume I had set `minimum_master_nodes` to 2. Had node (3) been completely isolated from nodes (1) and (2), I would not have run into this problem. The left hemisphere would have enough nodes to satisfy the constraint; the right hemisphere would not. This would continue to work for larger clusters (with an appropriately larger value for `minimum_master_nodes`). The problem with `minimum_master_nodes` is that it does not work when the split brains are intersecting, as in my example above. Even on a larger cluster of, say, 7 nodes with `minimum_master_nodes` set to 4, all that needs to happen is for the 'right' two nodes to lose contact with one another (a master election has to take place) for the cluster to split. Is there anything that can be done to detect the intersecting split on node (1)? Would #1057 help? Am I missing something obvious? :)

github.com/elastic/elasticsearch

split brain condition after second network disconnect - even with minimum_master_nodes set

opened 06:52AM - 25 Jul 12 UTC

closed 02:09AM - 16 Jun 14 UTC

owenbutler

>bug

## Summary: Split brain can occur on the second network disconnect of a node, w…hen the minimum_master_nodes is configured correctly(n/2+1). The split brain occurs if the nodeId(UUID) of the disconnected node is such that the disconnected node picks itself as the next logical master while pinging the other nodes(NodeFaultDetection). The split brain only occurs on the second time that the node is disconnected/isolated. ## Detail: Using ZenDiscovery, Node Id's are randomly generated(A UUID): ZenDiscovery:169. When the node is disconnected/isolated it the ElectMasterService uses an ordered list of the Nodes (Ordered by nodeId) to determine a new potential master. It picks the first of the ordered list: ElectMasterService:95 Because the nodeId's are random, it's possible for the disconnected/isolated node to be first in the ordered list, electing itself as a possible master. The first time network is disconnected, the minimum_master_nodes property is honored and the disconnected/isolated node goes into a "ping" mode, where it simply tries to ping for other nodes. Once the network is re-connected, the node re-joins the cluster successfully. The Second time the network is disconnected, the minimum_master_nodes intent is not honored. The disconnected/isolated node fails to realise that it's not connected to the remaining node in the 3 node cluster and elects itself as master, still thinking it's connected. It feels like there is a failure in the transition between MasterFaultDetection and NodeFaultDetection, because it works the first time! The fault only occurs if the nodeId is ordered such that the disconnected node picks itself as the master while isolated. If the nodeId's are ordered such that it picks one of the other 2 nodes to be potential master then the isolated node honors the minimum_master_nodes intent every time. Because the nodeId's are randomly(UUID) generated, the probability of this occuring drops as the number of nodes in the cluster goes up. For our 3 node cluster it's ~50% (with one node detected as gone, it's up to the ordering of the remaining two nodeId's) Note, While we were trying track this down we found that the cluster.service TRACE level logging (which outputs the cluster state) does not list the nodes in election order. IE, the first node in that printed list is not necessarily going to elected as master by the isolated node. ## Detail Steps to reproduce: Because the ordering of the nodeId's is random(UUID) we were having trouble getting a consitantly reproducable test case. To fix the ordering, we made a patch to ZenDiscovery to allow us to optionally configure a nodeId. This allowed us to set the nodeId of the disconnected/isolated node to guarantee it's ordering, allowing us to consistently reproduce. We've tested this scenario on the 0.19.4, 0.19.7, 0.19.8 distributions and see the error when the nodeId's were ordered just right. We also tested this scenario on the current git master with the supplied patch. In this scenario, node3 will the be the node we disconnect/isolate. So we start the nodes up in numerical order to ensure node3 doesn't _start_ as master. 1. Configure nodes with attached configs (one is provided for each node) 2. Start up nodes 1 and 2. After they are attached and one is master, start node 3 3. Create a blank index with default shard/replica(5/1) settings 4. Pull network cable from node 3 5. Node 3 detects master has gone (MasterFaultDetection) 6. Node 3 elects itself as master (Because the nodeId's are ordered just right) 7. Node 3 detects the remaining node has gone, enters ZenDiscovery minimum_master_nodes mode, prints a message indicating not enough nodes 8. Node 3 goes into a ping state looking for nodes 9. At this point, node 1 and node 2 report a valid cluster, they know about each other but not about node 3. 10. Reconnect network to node 3 11. Node 3 rejoins the cluster correctly, seeing that there is already a master in the cluster. At this point, everything is working as expected. 1. Pull network cable from node 3 again 2. Node 3 detects master has gone (MasterFaultDetection) 3. Node 3 elects as itself as master (Because the nodeId's are ordered just right) 4. Node 3 now fails to detect that the remaining node in the cluster is not accessible. It starts throwing a number of Netty NoRouteToHostExceptions about the remaining node. 5. According to node 3, cluster health is yellow and cluster state shows 2 data nodes 6. Reconnect network to node 3 7. Node 3 appears to connect to the node that it thinks it's still connected to. (can see that via the cluster state api). The other nodes log nothing and do not show the disconnected node as connected in any way. 8. Node 3 at this point accepts indexing and search requests, a classic split brain. Here's a gist with the patch to ZenDiscovery and the 3 node configs. https://gist.github.com/3174651

I suggest upgrading to 0.20.5

The issues above have not been address and I do not see anything else
related to cluster management in the commits/issues. A rolling upgrade is
possible, so I probably will upgrade anyways, but I fear that the issues
will remain.

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Node not join the cluster so what happen about the data? Elasticsearch	4	364	July 6, 2017
Ping/Zen/minimum_master_nodes and unexpected behaviour Elasticsearch	4	388	July 6, 2017
Unexpected cluster state Elasticsearch	5	508	July 6, 2017
Discovery_zen disconnect issues Elasticsearch	5	404	July 6, 2017
Blocking the communication between 2 ElasticSearch severs in 4 nodes cluster leads to split brain Elasticsearch	2	432	July 6, 2017

Nodes fail to join cluster - potential split brain scenario

Related topics