Split brain on 0.17.8 -> 0.18.7 upgrade with unicast

We upgraded two clusters from 0.17.8 to 0.18.7 that were using unicast
(due to network restrictions) and ended up in a split-brain
situation. The nodes would only discover other nodes with the same
version as themselves.

The node that was upgraded to 0.18.7 first started spewing these
errors it its error log when it came back up:

[2012-01-11 10:03:53,674][INFO ][node                     ]

[search-01] {elasticsearch/0.17.8}[19888]: stopping ...
[2012-01-11 10:03:53,929][INFO ][node ]
[search-01] {elasticsearch/0.17.8}[19888]: stopped
[2012-01-11 10:03:53,929][INFO ][node ]
[search-01] {elasticsearch/0.17.8}[19888]: closing ...
[2012-01-11 10:03:54,228][INFO ][node ]
[search-01] {elasticsearch/0.17.8}[19888]: closed
[2012-01-11 10:05:49,512][INFO ][node ]
[search-01] {0.18.7}[7135]: initializing ...
[2012-01-11 10:05:49,519][INFO ][plugins ]
[search-01] loaded [], sites [bigdesk, head]
[2012-01-11 10:05:50,776][INFO ][node ]
[search-01] {0.18.7}[7135]: initialized
[2012-01-11 10:05:50,776][INFO ][node ]
[search-01] {0.18.7}[7135]: starting ...
[2012-01-11 10:05:50,836][INFO ][transport ]
[search-01] bound_address {inet[/0:0:0:0:0:0:0:0:9300]},
publish_address {inet[/10.147.174.140:9300]}
[2012-01-11 10:05:50,970][WARN ][discovery.zen.ping.unicast]
[search-01] failed to send ping to
[[#zen_unicast_4#][inet[search-04.example.com/10.147.176.146:9300]]]
org.elasticsearch.transport.RemoteTransportException:
[search-04][inet[/10.147.176.146:9300]][discovery/zen/unicast]
Caused by: java.io.EOFException
at org.elasticsearch.common.io.stream.StreamInput.readBoolean(StreamInput.java:229)
at org.elasticsearch.discovery.zen.ping.ZenPing$PingResponse.readFrom(ZenPing.java:89)
at org.elasticsearch.discovery.zen.ping.ZenPing$PingResponse.readPingResponse(ZenPing.java:82)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$UnicastPingRequest.readFrom(UnicastZenPing.java:400)
at org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:182)
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:87)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:783)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:302)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:783)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:65)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:274)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:261)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Then as other upgraded nodes join the cluster on 0.18.7 it starts
discovering them:

[2012-01-11 10:05:53,898][INFO ][cluster.service          ]

[search-01] new_master
[search-01][M3fXnCk3TxS6T-GPjfgYQg][inet[/10.147.174.140:9300]],
reason: zen-disco-join (elected_as_master)
[2012-01-11 10:05:53,918][INFO ][discovery ]
[search-01] example/M3fXnCk3TxS6T-GPjfgYQg
[2012-01-11 10:05:54,055][INFO ][http ]
[search-01] bound_address {inet[/0:0:0:0:0:0:0:0:9200]},
publish_address {inet[/10.147.174.140:9200]}
[2012-01-11 10:05:54,055][INFO ][node ]
[search-01] {0.18.7}[7135]: started
[2012-01-11 10:05:54,078][INFO ][gateway ]
[search-01] recovered [2] indices into cluster_state
[2012-01-11 10:05:54,080][INFO ][cluster.metadata ]
[search-01] updating number_of_replicas to [0] for indices
[search_engine_20120110_104503]
[2012-01-11 10:05:54,955][INFO ][cluster.metadata ]
[search-01] [search_engine_20120110_104503] auto expanded replicas to
[0]
[2012-01-11 10:05:54,957][INFO ][cluster.metadata ]
[search-01] updating number_of_replicas to [0] for indices
[search_engine_20120109_104503]
[2012-01-11 10:05:54,961][INFO ][cluster.metadata ]
[search-01] [search_engine_20120109_104503] auto expanded replicas to
[0]
[2012-01-11 10:05:54,962][INFO ][cluster.metadata ]
[search-01] updating number_of_replicas to [0] for indices
[search_engine_20120109_104503]
[2012-01-11 10:05:54,966][INFO ][cluster.metadata ]
[search-01] [search_engine_20120109_104503] auto expanded replicas to
[0]
[2012-01-11 10:06:08,316][INFO ][cluster.service ]
[search-01] added
{[search-02][exui0NkQQNyjngxQlyX4PQ][inet[/10.147.176.140:9300]],},
reason: zen-disco-receive(join from
node[[search-02][exui0NkQQNyjngxQlyX4PQ][inet[/10.147.176.140:9300]]])
[2012-01-11 10:06:08,320][INFO ][cluster.metadata ]
[search-01] updating number_of_replicas to [1] for indices
[search_engine_20120110_104503]
[2012-01-11 10:06:08,382][INFO ][cluster.metadata ]
[search-01] [search_engine_20120110_104503] auto expanded replicas to
[1]
[2012-01-11 10:06:08,382][INFO ][cluster.metadata ]
[search-01] updating number_of_replicas to [1] for indices
[search_engine_20120109_104503]
[2012-01-11 10:06:08,387][INFO ][cluster.metadata ]
[search-01] [search_engine_20120109_104503] auto expanded replicas to
[1]
[2012-01-11 10:06:08,388][INFO ][cluster.metadata ]
[search-01] updating number_of_replicas to [1] for indices
[search_engine_20120109_104503]
[2012-01-11 10:06:08,392][INFO ][cluster.metadata ]
[search-01] [search_engine_20120109_104503] auto expanded replicas to
[1]
[2012-01-11 10:07:44,837][INFO ][cluster.service ]
[search-01] added
{[search-03][xG_BCl-uTz69cwBslygwsQ][inet[/10.147.174.142:9300]],},
reason: zen-disco-receive(join from
node[[search-03][xG_BCl-uTz69cwBslygwsQ][inet[/10.147.174.142:9300]]])
[2012-01-11 10:07:44,839][INFO ][cluster.metadata ]
[search-01] updating number_of_replicas to [2] for indices
[search_engine_20120110_104503]
[2012-01-11 10:07:44,896][INFO ][cluster.metadata ]
[search-01] [search_engine_20120110_104503] auto expanded replicas to
[2]
[2012-01-11 10:07:44,897][INFO ][cluster.metadata ]
[search-01] updating number_of_replicas to [2] for indices
[search_engine_20120109_104503]
[2012-01-11 10:07:44,902][INFO ][cluster.metadata ]
[search-01] [search_engine_20120109_104503] auto expanded replicas to
[2]
[2012-01-11 10:07:44,902][INFO ][cluster.metadata ]
[search-01] updating number_of_replicas to [2] for indices
[search_engine_20120109_104503]
[2012-01-11 10:07:44,907][INFO ][cluster.metadata ]
[search-01] [search_engine_20120109_104503] auto expanded replicas to
[2]
[2012-01-11 10:07:45,651][DEBUG][action.admin.cluster.node.info]
[search-01] failed to execute on node [xG_BCl-uTz69cwBslygwsQ]

Meanwhile the nodes still on 0.17.8 show "Message not fully read"
messages before they too go down for a 0.18.7 upgrade:

[2012-01-11 10:03:53,928][INFO ][cluster.service          ]

[search-03] removed
{[search-01][VGeQKtjoT4axRLKI5K_EWQ][inet[/10.147.174.140:9300]],},
reason: zen-disco-receive(from master
[[search-02][LWPgo4rHRASq1nrCI2ZRAA][inet[/10.147.176.140:9300]]])
[2012-01-11 10:04:13,164][INFO ][discovery.zen ]
[search-03] master_left
[[search-02][LWPgo4rHRASq1nrCI2ZRAA][inet[/10.147.176.140:9300]]],
reason [shut_down]
[2012-01-11 10:04:13,180][INFO ][cluster.service ]
[search-03] master {new
[search-03][5wOHm0i8SmeVqL5JkRSaFQ][inet[/10.147.174.142:9300]],
previous [search-02][LWPgo4rHRASq1nrCI2ZRAA][inet[/10.147.176.140:9300]]},
removed {[search-02][LWPgo4rHRASq1nrCI2ZRAA][inet[/10.147.176.140:9300]],},
reason: zen-disco-master_failed
([search-02][LWPgo4rHRASq1nrCI2ZRAA][inet[/10.147.176.140:9300]])
[2012-01-11 10:04:13,197][INFO ][cluster.metadata ]
[search-03] updating number_of_replicas to [1] for indices
[search_engine_20120110_104503]
[2012-01-11 10:04:13,200][INFO ][cluster.metadata ]
[search-03] [search_engine_20120110_104503] auto expanded replicas to
[1]
[2012-01-11 10:04:13,200][INFO ][cluster.metadata ]
[search-03] updating number_of_replicas to [1] for indices
[search_engine_20120109_104503]
[2012-01-11 10:04:13,202][INFO ][cluster.metadata ]
[search-03] [search_engine_20120109_104503] auto expanded replicas to
[1]
[2012-01-11 10:04:13,202][INFO ][cluster.metadata ]
[search-03] updating number_of_replicas to [1] for indices
[search_engine_20120110_104503]
[2012-01-11 10:04:13,204][INFO ][cluster.metadata ]
[search-03] [search_engine_20120110_104503] auto expanded replicas to
[1]
[2012-01-11 10:04:13,204][INFO ][cluster.metadata ]
[search-03] updating number_of_replicas to [1] for indices
[search_engine_20120109_104503]
[2012-01-11 10:04:13,205][INFO ][cluster.metadata ]
[search-03] [search_engine_20120109_104503] auto expanded replicas to
[1]
[2012-01-11 10:04:13,205][INFO ][cluster.metadata ]
[search-03] updating number_of_replicas to [1] for indices
[search_engine_20120109_104503]
[2012-01-11 10:04:13,207][INFO ][cluster.metadata ]
[search-03] [search_engine_20120109_104503] auto expanded replicas to
[1]
[2012-01-11 10:05:50,946][WARN ][transport.netty ]
[search-03] Message not fully read (request) for [0] and action
[discovery/zen/unicast], resetting
[2012-01-11 10:05:52,380][WARN ][transport.netty ]
[search-03] Message not fully read (request) for [4] and action
[discovery/zen/unicast], resetting
[2012-01-11 10:05:53,884][WARN ][transport.netty ]
[search-03] Message not fully read (request) for [7] and action
[discovery/zen/unicast], resetting
[2012-01-11 10:06:05,333][WARN ][transport.netty ]
[search-03] Message not fully read (request) for [1] and action
[discovery/zen/unicast], resetting
[2012-01-11 10:06:06,783][WARN ][transport.netty ]
[search-03] Message not fully read (request) for [6] and action
[discovery/zen/unicast], resetting
[2012-01-11 10:06:08,288][WARN ][transport.netty ]
[search-03] Message not fully read (request) for [10] and action
[discovery/zen/unicast], resetting
[2012-01-11 10:06:48,924][INFO ][node ]
[search-03] {elasticsearch/0.17.8}[31667]: stopping ...
[2012-01-11 10:06:49,174][INFO ][node ]
[search-03] {elasticsearch/0.17.8}[31667]: stopped
[2012-01-11 10:06:49,175][INFO ][node ]
[search-03] {elasticsearch/0.17.8}[31667]: closing ...
[2012-01-11 10:06:49,587][INFO ][node ]
[search-03] {elasticsearch/0.17.8}[31667]: closed
[2012-01-11 10:07:40,405][INFO ][node ]
[search-03] {0.18.7}[31045]: initializing ...
[2012-01-11 10:07:40,413][INFO ][plugins ]
[search-03] loaded [], sites [bigdesk, head]
[2012-01-11 10:07:41,712][INFO ][node ]
[search-03] {0.18.7}[31045]: initialized
[2012-01-11 10:07:41,713][INFO ][node ]
[search-03] {0.18.7}[31045]: starting ...
[2012-01-11 10:07:41,769][INFO ][transport ]
[search-03] bound_address {inet[/0:0:0:0:0:0:0:0:9300]},
publish_address {inet[/10.147.174.142:9300]}

Anyway I'll see it if happens again when we upgrade. It didn't happen
with upgrades within 0.17.*, but due to the way our cluster is setup
(we aren't populating data all the time) having a split brain for a
bit is not that big of a deal.

Hi Ævar

On Wed, 2012-01-11 at 12:53 +0100, Ævar Arnfjörð Bjarmason wrote:

We upgraded two clusters from 0.17.8 to 0.18.7 that were using unicast
(due to network restrictions) and ended up in a split-brain
situation. The nodes would only discover other nodes with the same
version as themselves.

Yes - while you can sometimes do a rolling upgrade on "major" version
numbers, the general advice is not to do so as they are not guaranteed
to be able to talk to each other.

clint

On Wed, Jan 11, 2012 at 13:00, Clinton Gormley clint@traveljury.com wrote:

Hi Ævar

On Wed, 2012-01-11 at 12:53 +0100, Ævar Arnfjörð Bjarmason wrote:

We upgraded two clusters from 0.17.8 to 0.18.7 that were using unicast
(due to network restrictions) and ended up in a split-brain
situation. The nodes would only discover other nodes with the same
version as themselves.

Yes - while you can sometimes do a rolling upgrade on "major" version
numbers, the general advice is not to do so as they are not guaranteed
to be able to talk to each other.

So what is the general advice for doing upgrades across major
revisions?

Full cluster restart. Shutdown the cluster, upgrade, and then start it with
the new version.

On Wed, Jan 11, 2012 at 2:28 PM, Ævar Arnfjörð Bjarmason
avarab@gmail.comwrote:

On Wed, Jan 11, 2012 at 13:00, Clinton Gormley clint@traveljury.com
wrote:

Hi Ævar

On Wed, 2012-01-11 at 12:53 +0100, Ævar Arnfjörð Bjarmason wrote:

We upgraded two clusters from 0.17.8 to 0.18.7 that were using unicast
(due to network restrictions) and ended up in a split-brain
situation. The nodes would only discover other nodes with the same
version as themselves.

Yes - while you can sometimes do a rolling upgrade on "major" version
numbers, the general advice is not to do so as they are not guaranteed
to be able to talk to each other.

So what is the general advice for doing upgrades across major
revisions?

On Wed, Jan 11, 2012 at 13:31, Shay Banon kimchy@gmail.com wrote:

Full cluster restart. Shutdown the cluster, upgrade, and then start it with
the new version.

And if you want to service search requests without that downtime?

I guess the only viable way to do that is to:

  1. Stop anything that may be adding new data to the cluster
  2. Start a rolling restart, suffer the split brain
  3. Once everything's restarted we'll have a unified brain with no conflicts
  4. Restart whatever's adding data to the cluster

Yea, that would work, though I suggest stopping half the cluster (and
disable reallocation since you are going to start it back up) using the
cluster update settings API (
Elasticsearch Platform — Find real-time answers at scale | Elastic)
with the disable_allocation setting. One the half cluster that is down,
start the new version, and then restart the other nodes with the old
version.

p.s. Your use of split brain is not really appropriate in this case. Since
you are causing it by using two different major versions of elasticsearch
in the same cluster.

On Wed, Jan 11, 2012 at 2:48 PM, Ævar Arnfjörð Bjarmason
avarab@gmail.comwrote:

On Wed, Jan 11, 2012 at 13:31, Shay Banon kimchy@gmail.com wrote:

Full cluster restart. Shutdown the cluster, upgrade, and then start it
with
the new version.

And if you want to service search requests without that downtime?

I guess the only viable way to do that is to:

  1. Stop anything that may be adding new data to the cluster
  2. Start a rolling restart, suffer the split brain
  3. Once everything's restarted we'll have a unified brain with no
    conflicts
  4. Restart whatever's adding data to the cluster