0.90.2: Discovery BUG when network outage occurs?

(amos.wood) #1


Using ES for quite a while now, we have multiple situations where network
outages have occurred which resulted in the data nodes not wanting to
rejoin the cluster after the network outage was resolved. The only way to
resolve this issue was to reboot the cluster (or maybe just the current

As I attempted to track this issue down to explain it, I have noticed that
quite a few people have posted similar issues on this forum but no one was
able to resolve it.


I have a small cluster (2 master/data nodes) running 0.90.2 using unicast

Steps to Reproduce

  1. From one of the nodes running on my local laptop, I connect to the
    cluster as a 3rd non-data node.
  2. I pull the network plug out of the back of my laptop and wait until I
    start to get "transport.netty" exception which is ~45 seconds.
  3. I then plug up the network again and wait until the initial
    connection is made again to the cluster to discover the master.
  4. I then unplug the network again before the cluster state has been
    successfully updated from the master.
  5. It then fails getting the master cluster state, but it doesn't
    continue trying to reconnect again. It will never attempt to reconnect
    again and you have to reboot the 3rd non-data node to reconnect.


The log file for the 3rd non-data node is attached.


Since I can successfully reproduce this issue, is it a bug or that expected?

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

(system) #2