I read about this split brain problem recently. Why is it "practically
guaranteeing data loss/corruption"? Could someone point me to a
document where this is explained?
I only found wikipedia and this one:
http://unixworld2010.wordpress.com/2011/01/10/how-vcs-avoids-split-brain/
"When this happens, systems on both sides of the partition can restart
applications from the other side resulting in duplicate services, or
'split-brain'"
Why can't a healthy service on a node prevent itself from beeing
restarted or duplicated?
Regards,
Peter.
On 7 Jul., 00:30, Shay Banon shay.ba...@elasticsearch.com wrote:
Heya,
Actually, I did some work to improve on that in master:Zen Discovery: Add `minimum_master_nodes` setting helping with split brains · Issue #1079 · elastic/elasticsearch · GitHub. This solves a lot of the cases of split brain, and makes zen more usable for this.
-shay.banon
On Thursday, July 7, 2011 at 1:25 AM, Paul Smith wrote:
I had originally thought I had seen some information about how the Zen discovery protocol deals with the Split Brain problem, but alas now I can't find it, or it wasn't there and I was dreaming it up in a caffeine fuelled haze.
I spotted a recent pull request/discussion about Zookeeper (Issue #1057). I know Zookeeper is designed around that quorum process and, as I understand it (which is saying, I probably don't), has defences against Split Brain.
Given the catastrophe of a split brain scenario, where rogue splinter cells of the cluster go off on their own, practically guaranteeing data loss/corruption, is there anything specific one should know about Zen and Split Brain?
It's obviously not an easy problem to solve, I'd just like to be 100% clear to my team if this does/does not address (or perhaps partially addresses) this evil scenario. Obviously Zookeeper is a much larger beast to be pulling in to ES, an Zen is simple and fairly elegant, Shay, what are your thoughts on this matter in a practical term?