Unexpected cluster state

Dave_Konopka · August 13, 2013, 7:15pm

Hello,

We're running a three node cluster with the following discovery settings:

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: <all three ip's, on each system's config>

Yesterday we had a networking blip that affected at least one of the nodes.
After the networking issue resolved node 1 and 2 were connected to each
other and in a green cluster state. Node 3 was connected to node 2 and
reporting 2 nodes in the cluster and in a yellow state. Querying the nodes
on 1 & 2 showed 1 & 2 were members. On 3 it was reporting 2 & 3 were
members. 1 cluster health was reporting unallocated shards. 3 was reporting
200 status for the node.

We restarted the service on 3 and it rejoined the cluster properly.

Does this scenario sound familiar to anyone? How is it that 1 & 2 and 2 & 3
would join each other separately? Is there any way to avoid this situation?

Thanks,
Dave

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Oli_McCormack · August 13, 2013, 7:55pm

Hi Dave,

I think that you're seeing this issue:
minimum_master_nodes does not prevent split-brain if splits are intersecting · Issue #2488 · elastic/elasticsearch · GitHub. We were
affected by this also.

I'm currently trying an alternative approach to the default discovery
mechanism, assessing zookeeper and the corresponding
pluginhttps://github.com/sonian/elasticsearch-zookeeper with
our cluster (as suggested in that ticket), which so far has proved
successful in avoiding this situation.

oli

On Tue, Aug 13, 2013 at 12:15 PM, Dave Konopka dave.konopka@gmail.comwrote:

Hello,

We're running a three node cluster with the following discovery settings:

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: <all three ip's, on each system's config>

Yesterday we had a networking blip that affected at least one of the
nodes. After the networking issue resolved node 1 and 2 were connected to
each other and in a green cluster state. Node 3 was connected to node 2 and
reporting 2 nodes in the cluster and in a yellow state. Querying the nodes
on 1 & 2 showed 1 & 2 were members. On 3 it was reporting 2 & 3 were
members. 1 cluster health was reporting unallocated shards. 3 was reporting
200 status for the node.

We restarted the service on 3 and it rejoined the cluster properly.

Does this scenario sound familiar to anyone? How is it that 1 & 2 and 2 &
3 would join each other separately? Is there any way to avoid this
situation?

Thanks,
Dave

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Dave_Konopka · August 13, 2013, 8:15pm

Oli,

Thanks for the pointer. I'll definitely dig into this issue thread.

I went through the server logs and they seem to align with the basic
premise of that issue description.

Node 1

3 failed pings to Node 3, removed from the cluster
Few node disconnect exceptions

Node 2

Remove Node 3, told to do so by Node 1
New master announces itself, Node 3
Few suspect illegal state warnings
New master announces itself, Node 1
Master left, Node 3
Added Node 3, told to do so by Node 1

Node 3

Master left: Node 1
New master: Node 3
Shutdown, startup sequence
Detected master: Node 1

Dave

On Tue, Aug 13, 2013 at 3:55 PM, Oli McCormack oli@climate.com wrote:

Hi Dave,

I think that you're seeing this issue:
minimum_master_nodes does not prevent split-brain if splits are intersecting · Issue #2488 · elastic/elasticsearch · GitHub. We were
affected by this also.

I'm currently trying an alternative approach to the default discovery
mechanism, assessing zookeeper and the corresponding pluginhttps://github.com/sonian/elasticsearch-zookeeper with
our cluster (as suggested in that ticket), which so far has proved
successful in avoiding this situation.

oli

On Tue, Aug 13, 2013 at 12:15 PM, Dave Konopka dave.konopka@gmail.comwrote:

Hello,

We're running a three node cluster with the following discovery settings:

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: <all three ip's, on each system's
config>

Yesterday we had a networking blip that affected at least one of the
nodes. After the networking issue resolved node 1 and 2 were connected to
each other and in a green cluster state. Node 3 was connected to node 2 and
reporting 2 nodes in the cluster and in a yellow state. Querying the nodes
on 1 & 2 showed 1 & 2 were members. On 3 it was reporting 2 & 3 were
members. 1 cluster health was reporting unallocated shards. 3 was reporting
200 status for the node.

We restarted the service on 3 and it rejoined the cluster properly.

Does this scenario sound familiar to anyone? How is it that 1 & 2 and 2 &
3 would join each other separately? Is there any way to avoid this
situation?

Thanks,
Dave

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan · August 14, 2013, 5:08pm

Which version of elasticsearch are you running? I found the logs to not be
too helpful when it comes to having some insights into the master election
process.

One useful tool is the 'lifecycle' command of this script:

--
Ivan

On Tue, Aug 13, 2013 at 1:15 PM, Dave Konopka dave.konopka@gmail.comwrote:

Oli,

Thanks for the pointer. I'll definitely dig into this issue thread.

I went through the server logs and they seem to align with the basic
premise of that issue description.

Node 1

3 failed pings to Node 3, removed from the cluster

Few node disconnect exceptions

Node 2

Remove Node 3, told to do so by Node 1

New master announces itself, Node 3

Few suspect illegal state warnings

New master announces itself, Node 1

Master left, Node 3

Added Node 3, told to do so by Node 1

Node 3

Master left: Node 1

New master: Node 3

Shutdown, startup sequence

Detected master: Node 1

Dave

On Tue, Aug 13, 2013 at 3:55 PM, Oli McCormack oli@climate.com wrote:

Hi Dave,

I think that you're seeing this issue:
minimum_master_nodes does not prevent split-brain if splits are intersecting · Issue #2488 · elastic/elasticsearch · GitHub. We were
affected by this also.

I'm currently trying an alternative approach to the default discovery
mechanism, assessing zookeeper and the corresponding pluginhttps://github.com/sonian/elasticsearch-zookeeper with
our cluster (as suggested in that ticket), which so far has proved
successful in avoiding this situation.

oli

On Tue, Aug 13, 2013 at 12:15 PM, Dave Konopka dave.konopka@gmail.comwrote:

Hello,

We're running a three node cluster with the following discovery settings:

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: <all three ip's, on each system's
config>

Yesterday we had a networking blip that affected at least one of the
nodes. After the networking issue resolved node 1 and 2 were connected to
each other and in a green cluster state. Node 3 was connected to node 2 and
reporting 2 nodes in the cluster and in a yellow state. Querying the nodes
on 1 & 2 showed 1 & 2 were members. On 3 it was reporting 2 & 3 were
members. 1 cluster health was reporting unallocated shards. 3 was reporting
200 status for the node.

We restarted the service on 3 and it rejoined the cluster properly.

Does this scenario sound familiar to anyone? How is it that 1 & 2 and 2
& 3 would join each other separately? Is there any way to avoid this
situation?

Thanks,
Dave

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Dave_Konopka · August 14, 2013, 9:01pm

Ivan,

We're running 0.90.1.

Thanks for pointing out es2unix. This looks handy.

Dave

On Wed, Aug 14, 2013 at 1:08 PM, Ivan Brusic ivan@brusic.com wrote:

Which version of elasticsearch are you running? I found the logs to not be
too helpful when it comes to having some insights into the master election
process.

One useful tool is the 'lifecycle' command of this script:
GitHub - elastic/es2unix: Command-line ES

--
Ivan

On Tue, Aug 13, 2013 at 1:15 PM, Dave Konopka dave.konopka@gmail.comwrote:

Oli,

Thanks for the pointer. I'll definitely dig into this issue thread.

I went through the server logs and they seem to align with the basic
premise of that issue description.

Node 1

3 failed pings to Node 3, removed from the cluster

Few node disconnect exceptions

Node 2

Remove Node 3, told to do so by Node 1

New master announces itself, Node 3

Few suspect illegal state warnings

New master announces itself, Node 1

Master left, Node 3

Added Node 3, told to do so by Node 1

Node 3

Master left: Node 1

New master: Node 3

Shutdown, startup sequence

Detected master: Node 1

Dave

On Tue, Aug 13, 2013 at 3:55 PM, Oli McCormack oli@climate.com wrote:

Hi Dave,

I think that you're seeing this issue:
minimum_master_nodes does not prevent split-brain if splits are intersecting · Issue #2488 · elastic/elasticsearch · GitHub. We were
affected by this also.

I'm currently trying an alternative approach to the default discovery
mechanism, assessing zookeeper and the corresponding pluginhttps://github.com/sonian/elasticsearch-zookeeper with
our cluster (as suggested in that ticket), which so far has proved
successful in avoiding this situation.

oli

On Tue, Aug 13, 2013 at 12:15 PM, Dave Konopka dave.konopka@gmail.comwrote:

Hello,

We're running a three node cluster with the following discovery
settings:

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: <all three ip's, on each system's
config>

Yesterday we had a networking blip that affected at least one of the
nodes. After the networking issue resolved node 1 and 2 were connected to
each other and in a green cluster state. Node 3 was connected to node 2 and
reporting 2 nodes in the cluster and in a yellow state. Querying the nodes
on 1 & 2 showed 1 & 2 were members. On 3 it was reporting 2 & 3 were
members. 1 cluster health was reporting unallocated shards. 3 was reporting
200 status for the node.

We restarted the service on 3 and it rejoined the cluster properly.

Does this scenario sound familiar to anyone? How is it that 1 & 2 and 2
& 3 would join each other separately? Is there any way to avoid this
situation?

Thanks,
Dave

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Node not join the cluster so what happen about the data? Elasticsearch	4	364	July 6, 2017
Ping/Zen/minimum_master_nodes and unexpected behaviour Elasticsearch	4	388	July 6, 2017
Split brain due to 'on the fence' network partition Elasticsearch	5	768	July 6, 2017
Cluster is broken Elasticsearch	10	669	July 6, 2017
Nodes fail to join cluster - potential split brain scenario Elasticsearch	11	563	July 6, 2017

Unexpected cluster state

Related topics