Cluster issue in production, split brain?

nalzapur · February 19, 2015, 12:40am

I saw an issue with cluster in production today. we have a cluster of 3 (
master & data ) nodes and min number of master nodes is set to 2 in config.
at some point due to network issue, server 3 couldnt connect to 2, but
could connect to 1..so it had an active cluster with 1 & 3 and 3 is
primary. On 2 bigdesk showed, cluster with 1 & 2 and 2 being primary. is
this a split brain scenario?

i assume not because same cluster name and 1 is part of both. bigdesk on 2
& 3 showed green. however we saw disc IO shot up to 100%, and inconsistency
on displaying data depending on which node the data was read from. After i
restarted service on node 3, it was all fine, all 3 nodes were in cluster
on bigdesk on all machines.

can anyone please help me how cluster can get into such state, and how to
avoid such scenario?

thank you.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cb3f75e8-49f5-48fc-be93-9bb444ee56df%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · February 19, 2015, 1:13am

This is a split brain scenario.

Are your nodes in the same DC?

On 19 February 2015 at 11:40, Nara Alzapur avnrao@gmail.com wrote:

I saw an issue with cluster in production today. we have a cluster of 3 (
master & data ) nodes and min number of master nodes is set to 2 in config.
at some point due to network issue, server 3 couldnt connect to 2, but
could connect to 1..so it had an active cluster with 1 & 3 and 3 is
primary. On 2 bigdesk showed, cluster with 1 & 2 and 2 being primary. is
this a split brain scenario?

i assume not because same cluster name and 1 is part of both. bigdesk on 2
& 3 showed green. however we saw disc IO shot up to 100%, and inconsistency
on displaying data depending on which node the data was read from. After i
restarted service on node 3, it was all fine, all 3 nodes were in cluster
on bigdesk on all machines.

can anyone please help me how cluster can get into such state, and how to
avoid such scenario?

thank you.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cb3f75e8-49f5-48fc-be93-9bb444ee56df%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cb3f75e8-49f5-48fc-be93-9bb444ee56df%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_JKWenby8HEhnkFyBfM334w4jOzgULW_izT8BfQbkf3g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

nalzapur · February 19, 2015, 1:18am

thank you for the reply. Yes. They are in same DC. I thought n + 1 /2
scenario would apply and such issue would not occur. Wonder how can a node
be part of 2 clusters at the same time? any config changes i can do to
avoid such issue?

On Wednesday, February 18, 2015 at 7:13:39 PM UTC-6, Mark Walkom wrote:

This is a split brain scenario.

Are your nodes in the same DC?

On 19 February 2015 at 11:40, Nara Alzapur <avn...@gmail.com <javascript:>

wrote:

I saw an issue with cluster in production today. we have a cluster of 3 (
master & data ) nodes and min number of master nodes is set to 2 in config.
at some point due to network issue, server 3 couldnt connect to 2, but
could connect to 1..so it had an active cluster with 1 & 3 and 3 is
primary. On 2 bigdesk showed, cluster with 1 & 2 and 2 being primary. is
this a split brain scenario?

i assume not because same cluster name and 1 is part of both. bigdesk on
2 & 3 showed green. however we saw disc IO shot up to 100%, and
inconsistency on displaying data depending on which node the data was read
from. After i restarted service on node 3, it was all fine, all 3 nodes
were in cluster on bigdesk on all machines.

can anyone please help me how cluster can get into such state, and how to
avoid such scenario?

thank you.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cb3f75e8-49f5-48fc-be93-9bb444ee56df%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cb3f75e8-49f5-48fc-be93-9bb444ee56df%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5e81fa7d-ac82-4fce-9d0f-971abb7ab71a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

nalzapur · February 19, 2015, 4:31am

we are on 1.3.2 version. I have looked in other threads for similar issues,
but they seem to be on very old versions. please let me know how to avoid
such split brain issues. thank you.

On Wednesday, February 18, 2015 at 7:18:29 PM UTC-6, Nara Alzapur wrote:

thank you for the reply. Yes. They are in same DC. I thought n + 1 /2
scenario would apply and such issue would not occur. Wonder how can a node
be part of 2 clusters at the same time? any config changes i can do to
avoid such issue?

On Wednesday, February 18, 2015 at 7:13:39 PM UTC-6, Mark Walkom wrote:

This is a split brain scenario.

Are your nodes in the same DC?

On 19 February 2015 at 11:40, Nara Alzapur avn...@gmail.com wrote:

I saw an issue with cluster in production today. we have a cluster of 3
( master & data ) nodes and min number of master nodes is set to 2 in
config.
at some point due to network issue, server 3 couldnt connect to 2, but
could connect to 1..so it had an active cluster with 1 & 3 and 3 is
primary. On 2 bigdesk showed, cluster with 1 & 2 and 2 being primary. is
this a split brain scenario?

i assume not because same cluster name and 1 is part of both. bigdesk on
2 & 3 showed green. however we saw disc IO shot up to 100%, and
inconsistency on displaying data depending on which node the data was read
from. After i restarted service on node 3, it was all fine, all 3 nodes
were in cluster on bigdesk on all machines.

can anyone please help me how cluster can get into such state, and how
to avoid such scenario?

thank you.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cb3f75e8-49f5-48fc-be93-9bb444ee56df%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cb3f75e8-49f5-48fc-be93-9bb444ee56df%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e9d75ac2-451f-496c-adef-b630d1f9dc8d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Split brain problem in 2 node elasticsearch cluster Elasticsearch	7	1113	July 6, 2017
Node not join the cluster so what happen about the data? Elasticsearch	4	361	July 6, 2017
Nodes fail to join cluster - potential split brain scenario Elasticsearch	11	561	July 6, 2017
Network outage keeps split brain status (no recovery by ES) (was issue #5144) Elasticsearch	7	1114	July 6, 2017
Unexpected cluster state Elasticsearch	5	502	July 6, 2017

Cluster issue in production, split brain?

Related topics