0.90.2: Discovery BUG when network outage occurs?

amos_wood · September 24, 2013, 2:53pm

Background

Using ES for quite a while now, we have multiple situations where network
outages have occurred which resulted in the data nodes not wanting to
rejoin the cluster after the network outage was resolved. The only way to
resolve this issue was to reboot the cluster (or maybe just the current
master).

As I attempted to track this issue down to explain it, I have noticed that
quite a few people have posted similar issues on this forum but no one was
able to resolve it.

TESTING SCENARIO

I have a small cluster (2 master/data nodes) running 0.90.2 using unicast
discovery.

Steps to Reproduce

From one of the nodes running on my local laptop, I connect to the
cluster as a 3rd non-data node.
I pull the network plug out of the back of my laptop and wait until I
start to get "transport.netty" exception which is ~45 seconds.
I then plug up the network again and wait until the initial
connection is made again to the cluster to discover the master.
I then unplug the network again before the cluster state has been
successfully updated from the master.
It then fails getting the master cluster state, but it doesn't
continue trying to reconnect again. It will never attempt to reconnect
again and you have to reboot the 3rd non-data node to reconnect.

Log

The log file for the 3rd non-data node is attached.

Conclusion

Since I can successfully reproduce this issue, is it a bug or that expected?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Elasticsearch 6.1.3 -- failed to discover master after node restart Elasticsearch	6	1240	April 27, 2018
Split-brain situation - forcing discovery and rejoin Elasticsearch	3	638	July 6, 2017
Node not connected Elasticsearch	4	11894	July 6, 2017
Recovery after network disconnection Elasticsearch	1	248	September 5, 2018
Cluster nodes doesn't reconnect Elasticsearch	4	1779	July 6, 2017

0.90.2: Discovery BUG when network outage occurs?

Related topics