Split-brain rejoining?


(Antonio Lobato) #1

The topic pretty much sums it up. If I have a cluster that split brains,
is there a way for them to rejoin into a single cluster without having to
reboot the part that split off? One software that comes to mind that does
this is Hazelcast (another java based product).

If this isn't possible, then consider this a feature request! Nodes should
try to relink to other nodes on a regular basis, via multicast if it's
enabled, or unicast.

Thanks :smiley:

--


(Richard Vowles) #2

This appears to be a regular request (once every few days). We are in a
situation where we are running two nodes - one in each data centre. If the
link goes down for a prolonged period of time (we have been testing it with
low ping timeouts to simulate this) then they never, ever attempt to
reconnect, and both sides become master. The next attempt is to try and
force one of them to never be master, so it won't accept POST request
updates which is a nasty hack and means devops have to do work to recover
it, and if that doesn't work, we'll have to fall back to getting the
clusters status before posting.

Ideally there should be another setting that tells Zen to check every x
seconds after a failure.

On Monday, August 27, 2012 4:15:53 AM UTC+12, Antonio Lobato wrote:

The topic pretty much sums it up. If I have a cluster that split brains,
is there a way for them to rejoin into a single cluster without having to
reboot the part that split off? One software that comes to mind that does
this is Hazelcast (another java based product).

If this isn't possible, then consider this a feature request! Nodes
should try to relink to other nodes on a regular basis, via multicast if
it's enabled, or unicast.

Thanks :smiley:

--


(system) #3