Zen, Split Brain and the Art of Master Election

I read about this split brain problem recently. Why is it "practically
guaranteeing data loss/corruption"? Could someone point me to a
document where this is explained?

I only found wikipedia and this one:

http://unixworld2010.wordpress.com/2011/01/10/how-vcs-avoids-split-brain/

"When this happens, systems on both sides of the partition can restart
applications from the other side resulting in duplicate services, or
'split-brain'"

Why can't a healthy service on a node prevent itself from beeing
restarted or duplicated?

Regards,
Peter.

On 7 Jul., 00:30, Shay Banon shay.ba...@elasticsearch.com wrote:

Heya,

Actually, I did some work to improve on that in master:Zen Discovery: Add `minimum_master_nodes` setting helping with split brains · Issue #1079 · elastic/elasticsearch · GitHub. This solves a lot of the cases of split brain, and makes zen more usable for this.

-shay.banon

On Thursday, July 7, 2011 at 1:25 AM, Paul Smith wrote:

I had originally thought I had seen some information about how the Zen discovery protocol deals with the Split Brain problem, but alas now I can't find it, or it wasn't there and I was dreaming it up in a caffeine fuelled haze.

I spotted a recent pull request/discussion about Zookeeper (Issue #1057). I know Zookeeper is designed around that quorum process and, as I understand it (which is saying, I probably don't), has defences against Split Brain.

Given the catastrophe of a split brain scenario, where rogue splinter cells of the cluster go off on their own, practically guaranteeing data loss/corruption, is there anything specific one should know about Zen and Split Brain?

It's obviously not an easy problem to solve, I'd just like to be 100% clear to my team if this does/does not address (or perhaps partially addresses) this evil scenario. Obviously Zookeeper is a much larger beast to be pulling in to ES, an Zen is simple and fairly elegant, Shay, what are your thoughts on this matter in a practical term?