I saw an issue with cluster in production today. we have a cluster of 3 (
master & data ) nodes and min number of master nodes is set to 2 in config.
at some point due to network issue, server 3 couldnt connect to 2, but
could connect to 1..so it had an active cluster with 1 & 3 and 3 is
primary. On 2 bigdesk showed, cluster with 1 & 2 and 2 being primary. is
this a split brain scenario?
i assume not because same cluster name and 1 is part of both. bigdesk on 2
& 3 showed green. however we saw disc IO shot up to 100%, and inconsistency
on displaying data depending on which node the data was read from. After i
restarted service on node 3, it was all fine, all 3 nodes were in cluster
on bigdesk on all machines.
can anyone please help me how cluster can get into such state, and how to
avoid such scenario?
On 19 February 2015 at 11:40, Nara Alzapur avnrao@gmail.com wrote:
I saw an issue with cluster in production today. we have a cluster of 3 (
master & data ) nodes and min number of master nodes is set to 2 in config.
at some point due to network issue, server 3 couldnt connect to 2, but
could connect to 1..so it had an active cluster with 1 & 3 and 3 is
primary. On 2 bigdesk showed, cluster with 1 & 2 and 2 being primary. is
this a split brain scenario?
i assume not because same cluster name and 1 is part of both. bigdesk on 2
& 3 showed green. however we saw disc IO shot up to 100%, and inconsistency
on displaying data depending on which node the data was read from. After i
restarted service on node 3, it was all fine, all 3 nodes were in cluster
on bigdesk on all machines.
can anyone please help me how cluster can get into such state, and how to
avoid such scenario?
thank you for the reply. Yes. They are in same DC. I thought n + 1 /2
scenario would apply and such issue would not occur. Wonder how can a node
be part of 2 clusters at the same time? any config changes i can do to
avoid such issue?
On Wednesday, February 18, 2015 at 7:13:39 PM UTC-6, Mark Walkom wrote:
This is a split brain scenario.
Are your nodes in the same DC?
On 19 February 2015 at 11:40, Nara Alzapur <avn...@gmail.com <javascript:>
wrote:
I saw an issue with cluster in production today. we have a cluster of 3 (
master & data ) nodes and min number of master nodes is set to 2 in config.
at some point due to network issue, server 3 couldnt connect to 2, but
could connect to 1..so it had an active cluster with 1 & 3 and 3 is
primary. On 2 bigdesk showed, cluster with 1 & 2 and 2 being primary. is
this a split brain scenario?
i assume not because same cluster name and 1 is part of both. bigdesk on
2 & 3 showed green. however we saw disc IO shot up to 100%, and
inconsistency on displaying data depending on which node the data was read
from. After i restarted service on node 3, it was all fine, all 3 nodes
were in cluster on bigdesk on all machines.
can anyone please help me how cluster can get into such state, and how to
avoid such scenario?
we are on 1.3.2 version. I have looked in other threads for similar issues,
but they seem to be on very old versions. please let me know how to avoid
such split brain issues. thank you.
On Wednesday, February 18, 2015 at 7:18:29 PM UTC-6, Nara Alzapur wrote:
thank you for the reply. Yes. They are in same DC. I thought n + 1 /2
scenario would apply and such issue would not occur. Wonder how can a node
be part of 2 clusters at the same time? any config changes i can do to
avoid such issue?
On Wednesday, February 18, 2015 at 7:13:39 PM UTC-6, Mark Walkom wrote:
This is a split brain scenario.
Are your nodes in the same DC?
On 19 February 2015 at 11:40, Nara Alzapur avn...@gmail.com wrote:
I saw an issue with cluster in production today. we have a cluster of 3
( master & data ) nodes and min number of master nodes is set to 2 in
config.
at some point due to network issue, server 3 couldnt connect to 2, but
could connect to 1..so it had an active cluster with 1 & 3 and 3 is
primary. On 2 bigdesk showed, cluster with 1 & 2 and 2 being primary. is
this a split brain scenario?
i assume not because same cluster name and 1 is part of both. bigdesk on
2 & 3 showed green. however we saw disc IO shot up to 100%, and
inconsistency on displaying data depending on which node the data was read
from. After i restarted service on node 3, it was all fine, all 3 nodes
were in cluster on bigdesk on all machines.
can anyone please help me how cluster can get into such state, and how
to avoid such scenario?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.