Unexpected cluster state

Hello,

We're running a three node cluster with the following discovery settings:

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: <all three ip's, on each system's config>

Yesterday we had a networking blip that affected at least one of the nodes.
After the networking issue resolved node 1 and 2 were connected to each
other and in a green cluster state. Node 3 was connected to node 2 and
reporting 2 nodes in the cluster and in a yellow state. Querying the nodes
on 1 & 2 showed 1 & 2 were members. On 3 it was reporting 2 & 3 were
members. 1 cluster health was reporting unallocated shards. 3 was reporting
200 status for the node.

We restarted the service on 3 and it rejoined the cluster properly.

Does this scenario sound familiar to anyone? How is it that 1 & 2 and 2 & 3
would join each other separately? Is there any way to avoid this situation?

Thanks,
Dave

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Dave,

I think that you're seeing this issue:
https://github.com/elasticsearch/elasticsearch/issues/2488. We were
affected by this also.

I'm currently trying an alternative approach to the default discovery
mechanism, assessing zookeeper and the corresponding
pluginhttps://github.com/sonian/elasticsearch-zookeeper with
our cluster (as suggested in that ticket), which so far has proved
successful in avoiding this situation.

  • oli

On Tue, Aug 13, 2013 at 12:15 PM, Dave Konopka dave.konopka@gmail.comwrote:

Hello,

We're running a three node cluster with the following discovery settings:

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: <all three ip's, on each system's config>

Yesterday we had a networking blip that affected at least one of the
nodes. After the networking issue resolved node 1 and 2 were connected to
each other and in a green cluster state. Node 3 was connected to node 2 and
reporting 2 nodes in the cluster and in a yellow state. Querying the nodes
on 1 & 2 showed 1 & 2 were members. On 3 it was reporting 2 & 3 were
members. 1 cluster health was reporting unallocated shards. 3 was reporting
200 status for the node.

We restarted the service on 3 and it rejoined the cluster properly.

Does this scenario sound familiar to anyone? How is it that 1 & 2 and 2 &
3 would join each other separately? Is there any way to avoid this
situation?

Thanks,
Dave

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Oli,

Thanks for the pointer. I'll definitely dig into this issue thread.

I went through the server logs and they seem to align with the basic
premise of that issue description.

Node 1

  • 3 failed pings to Node 3, removed from the cluster
  • Few node disconnect exceptions

Node 2

  • Remove Node 3, told to do so by Node 1
  • New master announces itself, Node 3
  • Few suspect illegal state warnings
  • New master announces itself, Node 1
  • Master left, Node 3
  • Added Node 3, told to do so by Node 1

Node 3

  • Master left: Node 1
  • New master: Node 3
  • Shutdown, startup sequence
  • Detected master: Node 1

Dave

On Tue, Aug 13, 2013 at 3:55 PM, Oli McCormack oli@climate.com wrote:

Hi Dave,

I think that you're seeing this issue:
https://github.com/elasticsearch/elasticsearch/issues/2488. We were
affected by this also.

I'm currently trying an alternative approach to the default discovery
mechanism, assessing zookeeper and the corresponding pluginhttps://github.com/sonian/elasticsearch-zookeeper with
our cluster (as suggested in that ticket), which so far has proved
successful in avoiding this situation.

  • oli

On Tue, Aug 13, 2013 at 12:15 PM, Dave Konopka dave.konopka@gmail.comwrote:

Hello,

We're running a three node cluster with the following discovery settings:

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: <all three ip's, on each system's
config>

Yesterday we had a networking blip that affected at least one of the
nodes. After the networking issue resolved node 1 and 2 were connected to
each other and in a green cluster state. Node 3 was connected to node 2 and
reporting 2 nodes in the cluster and in a yellow state. Querying the nodes
on 1 & 2 showed 1 & 2 were members. On 3 it was reporting 2 & 3 were
members. 1 cluster health was reporting unallocated shards. 3 was reporting
200 status for the node.

We restarted the service on 3 and it rejoined the cluster properly.

Does this scenario sound familiar to anyone? How is it that 1 & 2 and 2 &
3 would join each other separately? Is there any way to avoid this
situation?

Thanks,
Dave

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Which version of elasticsearch are you running? I found the logs to not be
too helpful when it comes to having some insights into the master election
process.

One useful tool is the 'lifecycle' command of this script:

--
Ivan

On Tue, Aug 13, 2013 at 1:15 PM, Dave Konopka dave.konopka@gmail.comwrote:

Oli,

Thanks for the pointer. I'll definitely dig into this issue thread.

I went through the server logs and they seem to align with the basic
premise of that issue description.

Node 1

  • 3 failed pings to Node 3, removed from the cluster
  • Few node disconnect exceptions

Node 2

  • Remove Node 3, told to do so by Node 1
  • New master announces itself, Node 3
  • Few suspect illegal state warnings
  • New master announces itself, Node 1
  • Master left, Node 3
  • Added Node 3, told to do so by Node 1

Node 3

  • Master left: Node 1
  • New master: Node 3
  • Shutdown, startup sequence
  • Detected master: Node 1

Dave

On Tue, Aug 13, 2013 at 3:55 PM, Oli McCormack oli@climate.com wrote:

Hi Dave,

I think that you're seeing this issue:
https://github.com/elasticsearch/elasticsearch/issues/2488. We were
affected by this also.

I'm currently trying an alternative approach to the default discovery
mechanism, assessing zookeeper and the corresponding pluginhttps://github.com/sonian/elasticsearch-zookeeper with
our cluster (as suggested in that ticket), which so far has proved
successful in avoiding this situation.

  • oli

On Tue, Aug 13, 2013 at 12:15 PM, Dave Konopka dave.konopka@gmail.comwrote:

Hello,

We're running a three node cluster with the following discovery settings:

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: <all three ip's, on each system's
config>

Yesterday we had a networking blip that affected at least one of the
nodes. After the networking issue resolved node 1 and 2 were connected to
each other and in a green cluster state. Node 3 was connected to node 2 and
reporting 2 nodes in the cluster and in a yellow state. Querying the nodes
on 1 & 2 showed 1 & 2 were members. On 3 it was reporting 2 & 3 were
members. 1 cluster health was reporting unallocated shards. 3 was reporting
200 status for the node.

We restarted the service on 3 and it rejoined the cluster properly.

Does this scenario sound familiar to anyone? How is it that 1 & 2 and 2
& 3 would join each other separately? Is there any way to avoid this
situation?

Thanks,
Dave

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan,

We're running 0.90.1.

Thanks for pointing out es2unix. This looks handy.

Dave

On Wed, Aug 14, 2013 at 1:08 PM, Ivan Brusic ivan@brusic.com wrote:

Which version of elasticsearch are you running? I found the logs to not be
too helpful when it comes to having some insights into the master election
process.

One useful tool is the 'lifecycle' command of this script:
https://github.com/elasticsearch/es2unix

--
Ivan

On Tue, Aug 13, 2013 at 1:15 PM, Dave Konopka dave.konopka@gmail.comwrote:

Oli,

Thanks for the pointer. I'll definitely dig into this issue thread.

I went through the server logs and they seem to align with the basic
premise of that issue description.

Node 1

  • 3 failed pings to Node 3, removed from the cluster
  • Few node disconnect exceptions

Node 2

  • Remove Node 3, told to do so by Node 1
  • New master announces itself, Node 3
  • Few suspect illegal state warnings
  • New master announces itself, Node 1
  • Master left, Node 3
  • Added Node 3, told to do so by Node 1

Node 3

  • Master left: Node 1
  • New master: Node 3
  • Shutdown, startup sequence
  • Detected master: Node 1

Dave

On Tue, Aug 13, 2013 at 3:55 PM, Oli McCormack oli@climate.com wrote:

Hi Dave,

I think that you're seeing this issue:
https://github.com/elasticsearch/elasticsearch/issues/2488. We were
affected by this also.

I'm currently trying an alternative approach to the default discovery
mechanism, assessing zookeeper and the corresponding pluginhttps://github.com/sonian/elasticsearch-zookeeper with
our cluster (as suggested in that ticket), which so far has proved
successful in avoiding this situation.

  • oli

On Tue, Aug 13, 2013 at 12:15 PM, Dave Konopka dave.konopka@gmail.comwrote:

Hello,

We're running a three node cluster with the following discovery
settings:

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: <all three ip's, on each system's
config>

Yesterday we had a networking blip that affected at least one of the
nodes. After the networking issue resolved node 1 and 2 were connected to
each other and in a green cluster state. Node 3 was connected to node 2 and
reporting 2 nodes in the cluster and in a yellow state. Querying the nodes
on 1 & 2 showed 1 & 2 were members. On 3 it was reporting 2 & 3 were
members. 1 cluster health was reporting unallocated shards. 3 was reporting
200 status for the node.

We restarted the service on 3 and it rejoined the cluster properly.

Does this scenario sound familiar to anyone? How is it that 1 & 2 and 2
& 3 would join each other separately? Is there any way to avoid this
situation?

Thanks,
Dave

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.