In a truly pathological case, where 2 halves of a cluster don't know about
each other, both sides elect themselves a master (unless, say, this Zen
extension prevents one side from doing that because it realises there's not
enough quorum on it's side).
You now have 2 clusters, who both think they're valid. With 2 masters, both
masters would be trying to re-replicating shards and work towards green
health, and shuffling things around. This might not be SO bad (but it
probably is) but if you have clients of these 2 clusters, perhaps split
between the 2 evil clusters, then, say, half the updates are going to one,
half the other.
You now have 2 clusters, neither of which have the true index state. I'm
pretty sure the only recourse then, after addressing the connectivity issues
(fix the split brain) is to blow away the index and reindex, because you
can't trust either half.
Having an Index Verifier process (something we're working on) could end up
being quicker to repair a dodgy cluster than a full reindex, after
something like this, but once you're in split brain, data integrity is
pretty much lost as soon as you have a single update.
That's how I read it anyway.
On 7 July 2011 09:03, Karussell email@example.com wrote:
I read about this split brain problem recently. Why is it "practically
guaranteeing data loss/corruption"? Could someone point me to a
document where this is explained?
I only found wikipedia and this one:
"When this happens, systems on both sides of the partition can restart
applications from the other side resulting in duplicate services, or
Why can't a healthy service on a node prevent itself from beeing
restarted or duplicated?
On 7 Jul., 00:30, Shay Banon shay.ba...@elasticsearch.com wrote:
Actually, I did some work to improve on that in master:
https://github.com/elasticsearch/elasticsearch/issues/1079. This solves a
lot of the cases of split brain, and makes zen more usable for this.
On Thursday, July 7, 2011 at 1:25 AM, Paul Smith wrote:
I had originally thought I had seen some information about how the Zen
discovery protocol deals with the Split Brain problem, but alas now I can't
find it, or it wasn't there and I was dreaming it up in a caffeine fuelled
I spotted a recent pull request/discussion about Zookeeper (Issue
#1057). I know Zookeeper is designed around that quorum process and, as I
understand it (which is saying, I probably don't), has defences against
Given the catastrophe of a split brain scenario, where rogue splinter
cells of the cluster go off on their own, practically guaranteeing data
loss/corruption, is there anything specific one should know about Zen and
It's obviously not an easy problem to solve, I'd just like to be 100%
clear to my team if this does/does not address (or perhaps partially
addresses) this evil scenario. Obviously Zookeeper is a much larger beast to
be pulling in to ES, an Zen is simple and fairly elegant, Shay, what are
your thoughts on this matter in a practical term?