Zen, Split Brain and the Art of Master Election

Paul_Smith · July 6, 2011, 10:25pm

I had originally thought I had seen some information about how the Zen
discovery protocol deals with the Split Brain problem, but alas now I can't
find it, or it wasn't there and I was dreaming it up in a caffeine fuelled
haze.

I spotted a recent pull request/discussion about Zookeeper (Issue #1057). I
know Zookeeper is designed around that quorum process and, as I understand
it (which is saying, I probably don't), has defences against Split Brain.

Given the catastrophe of a split brain scenario, where rogue splinter cells
of the cluster go off on their own, practically guaranteeing data
loss/corruption, is there anything specific one should know about Zen and
Split Brain?

It's obviously not an easy problem to solve, I'd just like to be 100% clear
to my team if this does/does not address (or perhaps partially addresses)
this evil scenario. Obviously Zookeeper is a much larger beast to be
pulling in to ES, an Zen is simple and fairly elegant, Shay, what are your
thoughts on this matter in a practical term?

kimchy · July 6, 2011, 10:30pm

Heya,

Actually, I did some work to improve on that in master: Zen Discovery: Add `minimum_master_nodes` setting helping with split brains · Issue #1079 · elastic/elasticsearch · GitHub. This solves a lot of the cases of split brain, and makes zen more usable for this.

-shay.banon

On Thursday, July 7, 2011 at 1:25 AM, Paul Smith wrote:

I had originally thought I had seen some information about how the Zen discovery protocol deals with the Split Brain problem, but alas now I can't find it, or it wasn't there and I was dreaming it up in a caffeine fuelled haze.

I spotted a recent pull request/discussion about Zookeeper (Issue #1057). I know Zookeeper is designed around that quorum process and, as I understand it (which is saying, I probably don't), has defences against Split Brain.

Given the catastrophe of a split brain scenario, where rogue splinter cells of the cluster go off on their own, practically guaranteeing data loss/corruption, is there anything specific one should know about Zen and Split Brain?

It's obviously not an easy problem to solve, I'd just like to be 100% clear to my team if this does/does not address (or perhaps partially addresses) this evil scenario. Obviously Zookeeper is a much larger beast to be pulling in to ES, an Zen is simple and fairly elegant, Shay, what are your thoughts on this matter in a practical term?

Karussell1 · July 6, 2011, 11:03pm

I read about this split brain problem recently. Why is it "practically
guaranteeing data loss/corruption"? Could someone point me to a
document where this is explained?

I only found wikipedia and this one:

http://unixworld2010.wordpress.com/2011/01/10/how-vcs-avoids-split-brain/

"When this happens, systems on both sides of the partition can restart
applications from the other side resulting in duplicate services, or
'split-brain'"

Why can't a healthy service on a node prevent itself from beeing
restarted or duplicated?

Regards,
Peter.

On 7 Jul., 00:30, Shay Banon shay.ba...@elasticsearch.com wrote:

Heya,

Actually, I did some work to improve on that in master:Zen Discovery: Add `minimum_master_nodes` setting helping with split brains · Issue #1079 · elastic/elasticsearch · GitHub. This solves a lot of the cases of split brain, and makes zen more usable for this.

-shay.banon

On Thursday, July 7, 2011 at 1:25 AM, Paul Smith wrote:

I had originally thought I had seen some information about how the Zen discovery protocol deals with the Split Brain problem, but alas now I can't find it, or it wasn't there and I was dreaming it up in a caffeine fuelled haze.

I spotted a recent pull request/discussion about Zookeeper (Issue #1057). I know Zookeeper is designed around that quorum process and, as I understand it (which is saying, I probably don't), has defences against Split Brain.

Given the catastrophe of a split brain scenario, where rogue splinter cells of the cluster go off on their own, practically guaranteeing data loss/corruption, is there anything specific one should know about Zen and Split Brain?

It's obviously not an easy problem to solve, I'd just like to be 100% clear to my team if this does/does not address (or perhaps partially addresses) this evil scenario. Obviously Zookeeper is a much larger beast to be pulling in to ES, an Zen is simple and fairly elegant, Shay, what are your thoughts on this matter in a practical term?

Paul_Smith · July 6, 2011, 11:18pm

On 7 July 2011 08:30, Shay Banon shay.banon@elasticsearch.com wrote:

Heya,

Actually, I did some work to improve on that in master:
Zen Discovery: Add `minimum_master_nodes` setting helping with split brains · Issue #1079 · elastic/elasticsearch · GitHub. This solves a
lot of the cases of split brain, and makes zen more usable for this.

-shay.banon

Thanks Shay, that looks to cover many of the cases I can think of. My boss
has a background in this from work in SGI's filesystem group (CXFS), with a
LOT of burn marks from large clusters and real world split brain issues, so
he's just being diligent in the asking.. If it ever happened, it would be a
world of hurt to clean up (if reindexing took a bloody long time for
example)..

Paul_Smith · July 6, 2011, 11:25pm

In a truly pathological case, where 2 halves of a cluster don't know about
each other, both sides elect themselves a master (unless, say, this Zen
extension prevents one side from doing that because it realises there's not
enough quorum on it's side).

You now have 2 clusters, who both think they're valid. With 2 masters, both
masters would be trying to re-replicating shards and work towards green
health, and shuffling things around. This might not be SO bad (but it
probably is) but if you have clients of these 2 clusters, perhaps split
between the 2 evil clusters, then, say, half the updates are going to one,
half the other.

You now have 2 clusters, neither of which have the true index state. I'm
pretty sure the only recourse then, after addressing the connectivity issues
(fix the split brain) is to blow away the index and reindex, because you
can't trust either half.

Having an Index Verifier process (something we're working on) could end up
being quicker to repair a dodgy cluster than a full reindex, after
something like this, but once you're in split brain, data integrity is
pretty much lost as soon as you have a single update.

That's how I read it anyway.

Paul

On 7 July 2011 09:03, Karussell tableyourtime@googlemail.com wrote:

I read about this split brain problem recently. Why is it "practically
guaranteeing data loss/corruption"? Could someone point me to a
document where this is explained?

I only found wikipedia and this one:

http://unixworld2010.wordpress.com/2011/01/10/how-vcs-avoids-split-brain/

"When this happens, systems on both sides of the partition can restart
applications from the other side resulting in duplicate services, or
'split-brain'"

Why can't a healthy service on a node prevent itself from beeing
restarted or duplicated?

Regards,
Peter.

On 7 Jul., 00:30, Shay Banon shay.ba...@elasticsearch.com wrote:

Heya,

Actually, I did some work to improve on that in master:
Zen Discovery: Add `minimum_master_nodes` setting helping with split brains · Issue #1079 · elastic/elasticsearch · GitHub. This solves a
lot of the cases of split brain, and makes zen more usable for this.

-shay.banon

On Thursday, July 7, 2011 at 1:25 AM, Paul Smith wrote:

I had originally thought I had seen some information about how the Zen
discovery protocol deals with the Split Brain problem, but alas now I can't
find it, or it wasn't there and I was dreaming it up in a caffeine fuelled
haze.

I spotted a recent pull request/discussion about Zookeeper (Issue
#1057). I know Zookeeper is designed around that quorum process and, as I
understand it (which is saying, I probably don't), has defences against
Split Brain.

Given the catastrophe of a split brain scenario, where rogue splinter
cells of the cluster go off on their own, practically guaranteeing data
loss/corruption, is there anything specific one should know about Zen and
Split Brain?

It's obviously not an easy problem to solve, I'd just like to be 100%
clear to my team if this does/does not address (or perhaps partially
addresses) this evil scenario. Obviously Zookeeper is a much larger beast to
be pulling in to ES, an Zen is simple and fairly elegant, Shay, what are
your thoughts on this matter in a practical term?

kimchy · July 7, 2011, 2:28pm

Paul explained well what is a split brain and what the problem with it is. As he also noted, the new improvement to the zen discovery will mean that nodes that don't see "enough" (and its up to you to define what enough is) nodes int the cluster, will disconnect and try and join the cluster again. They will only join once they see enough master eligible nodes.

On Thursday, July 7, 2011 at 2:25 AM, Paul Smith wrote:

In a truly pathological case, where 2 halves of a cluster don't know about each other, both sides elect themselves a master (unless, say, this Zen extension prevents one side from doing that because it realises there's not enough quorum on it's side).

You now have 2 clusters, who both think they're valid. With 2 masters, both masters would be trying to re-replicating shards and work towards green health, and shuffling things around. This might not be SO bad (but it probably is) but if you have clients of these 2 clusters, perhaps split between the 2 evil clusters, then, say, half the updates are going to one, half the other.

You now have 2 clusters, neither of which have the true index state. I'm pretty sure the only recourse then, after addressing the connectivity issues (fix the split brain) is to blow away the index and reindex, because you can't trust either half.

Having an Index Verifier process (something we're working on) could end up being quicker to repair a dodgy cluster than a full reindex, after something like this, but once you're in split brain, data integrity is pretty much lost as soon as you have a single update.

That's how I read it anyway.

Paul

On 7 July 2011 09:03, Karussell <tableyourtime@googlemail.com (mailto:tableyourtime@googlemail.com)> wrote:

I read about this split brain problem recently. Why is it "practically
guaranteeing data loss/corruption"? Could someone point me to a
document where this is explained?

I only found wikipedia and this one:

http://unixworld2010.wordpress.com/2011/01/10/how-vcs-avoids-split-brain/

"When this happens, systems on both sides of the partition can restart
applications from the other side resulting in duplicate services, or
'split-brain'"

Why can't a healthy service on a node prevent itself from beeing
restarted or duplicated?

Regards,
Peter.

On 7 Jul., 00:30, Shay Banon <shay.ba...@elasticsearch.com (mailto:shay.ba...@elasticsearch.com)> wrote:

Heya,

Actually, I did some work to improve on that in master:Zen Discovery: Add `minimum_master_nodes` setting helping with split brains · Issue #1079 · elastic/elasticsearch · GitHub. This solves a lot of the cases of split brain, and makes zen more usable for this.

-shay.banon

On Thursday, July 7, 2011 at 1:25 AM, Paul Smith wrote:

I had originally thought I had seen some information about how the Zen discovery protocol deals with the Split Brain problem, but alas now I can't find it, or it wasn't there and I was dreaming it up in a caffeine fuelled haze.

I spotted a recent pull request/discussion about Zookeeper (Issue #1057). I know Zookeeper is designed around that quorum process and, as I understand it (which is saying, I probably don't), has defences against Split Brain.

Given the catastrophe of a split brain scenario, where rogue splinter cells of the cluster go off on their own, practically guaranteeing data loss/corruption, is there anything specific one should know about Zen and Split Brain?

It's obviously not an easy problem to solve, I'd just like to be 100% clear to my team if this does/does not address (or perhaps partially addresses) this evil scenario. Obviously Zookeeper is a much larger beast to be pulling in to ES, an Zen is simple and fairly elegant, Shay, what are your thoughts on this matter in a practical term?

Karussell1 · July 10, 2011, 9:33am

Thanks!

On 7 Jul., 16:28, Shay Banon shay.ba...@elasticsearch.com wrote:

Paul explained well what is a split brain and what the problem with it is. As he also noted, the new improvement to the zen discovery will mean that nodes that don't see "enough" (and its up to you to define what enough is) nodes int the cluster, will disconnect and try and join the cluster again. They will only join once they see enough master eligible nodes.

On Thursday, July 7, 2011 at 2:25 AM, Paul Smith wrote:

In a truly pathological case, where 2 halves of a cluster don't know about each other, both sides elect themselves a master (unless, say, this Zen extension prevents one side from doing that because it realises there's not enough quorum on it's side).

You now have 2 clusters, who both think they're valid. With 2 masters, both masters would be trying to re-replicating shards and work towards green health, and shuffling things around. This might not be SO bad (but it probably is) but if you have clients of these 2 clusters, perhaps split between the 2 evil clusters, then, say, half the updates are going to one, half the other.

You now have 2 clusters, neither of which have the true index state. I'm pretty sure the only recourse then, after addressing the connectivity issues (fix the split brain) is to blow away the index and reindex, because you can't trust either half.

Having an Index Verifier process (something we're working on) could end up being quicker to repair a dodgy cluster than a full reindex, after something like this, but once you're in split brain, data integrity is pretty much lost as soon as you have a single update.

That's how I read it anyway.

Paul

On 7 July 2011 09:03, Karussell <tableyourt...@googlemail.com (mailto:tableyourt...@googlemail.com)> wrote:

I read about this split brain problem recently. Why is it "practically
guaranteeing data loss/corruption"? Could someone point me to a
document where this is explained?

I only found wikipedia and this one:

http://unixworld2010.wordpress.com/2011/01/10/how-vcs-avoids-split-br...

"When this happens, systems on both sides of the partition can restart
applications from the other side resulting in duplicate services, or
'split-brain'"

Why can't a healthy service on a node prevent itself from beeing
restarted or duplicated?

Regards,
Peter.

On 7 Jul., 00:30, Shay Banon <shay.ba...@elasticsearch.com (mailto:shay.ba...@elasticsearch.com)> wrote:

Heya,

Actually, I did some work to improve on that in master:Zen Discovery: Add `minimum_master_nodes` setting helping with split brains · Issue #1079 · elastic/elasticsearch · GitHub. This solves a lot of the cases of split brain, and makes zen more usable for this.

-shay.banon

On Thursday, July 7, 2011 at 1:25 AM, Paul Smith wrote:

I had originally thought I had seen some information about how the Zen discovery protocol deals with the Split Brain problem, but alas now I can't find it, or it wasn't there and I was dreaming it up in a caffeine fuelled haze.

I spotted a recent pull request/discussion about Zookeeper (Issue #1057). I know Zookeeper is designed around that quorum process and, as I understand it (which is saying, I probably don't), has defences against Split Brain.

Given the catastrophe of a split brain scenario, where rogue splinter cells of the cluster go off on their own, practically guaranteeing data loss/corruption, is there anything specific one should know about Zen and Split Brain?

It's obviously not an easy problem to solve, I'd just like to be 100% clear to my team if this does/does not address (or perhaps partially addresses) this evil scenario. Obviously Zookeeper is a much larger beast to be pulling in to ES, an Zen is simple and fairly elegant, Shay, what are your thoughts on this matter in a practical term?