How to upgrade a cluster?


(caphrim007) #1

I have a default 5 shard, 1 replica cluster on 0.17.6 and was
wondering how one upgrades to 0.17.7 and, for that matter, other
future releases.

I tried the following

  • stopping the replica
  • copy contents of the new tarball on top of the old files
  • start the replica again

I expected the replica to then just pick up where it left off and
rejoin the cluster; the master sending any new updates to stuff.

Instead I received a lot of errors on the slave and master

[2011-09-21 09:51:29,604][WARN ][cluster.action.shard ] [dtmc]
sending failed shard for [paste][0], node[v4DssP8vRZW3DpFqHx3Kgg],
[R], s[INITIALIZING], reason [Failed to start shard, message
[RecoveryFailedException[Index Shard [paste][0]: Recovery failed from
[dtmb][-2ebGZhPTYiT-wJbpgC3bg][inet[/192.168.11.14:9300]] into [dtmc]
[v4DssP8vRZW3DpFqHx3Kgg][inet[/192.168.11.66:9300]]]; nested:
RemoteTransportException[[dtmb][inet[/192.168.11.14:9300]][index/shard/
recovery/startRecovery]]; nested: RecoveryEngineException[[paste][0]
Phase[2] Execution failed]; nested: RemoteTransportException[[dtmc]
[inet[/192.168.11.66:9300]][index/shard/recovery/translogOps]];
nested:
NoSuchMethodError[org.apache.lucene.document.Fieldable.setOmitTermFreqAndPositions(Z)V]; ]]

and the slave appeared to go into a tight loop of just spitting out
those errors.

So I stopped the slave and am scratching my head trying to figure out
how to restart it.

What is the general recommended way of upgrading the nodes in a
cluster? Can you do them one at a time? Or do you need to bring the
entire cluster down, upgrade them all, and then start them again?

Thanks,
Tim


(Shay Banon) #2

Extracting the installation on the same older installation will not do it
(as it will simply add files to it). You should extract it somewhere else,
and have the new installation pointing to the same data location (or copy
over the old installation data directory to the new installation).

Upgrades between minor releases can be done by doing a rolling upgrade,
though it will cause more shards moving around.

On Wed, Sep 21, 2011 at 6:02 PM, caphrim007 caphrim007@gmail.com wrote:

I have a default 5 shard, 1 replica cluster on 0.17.6 and was
wondering how one upgrades to 0.17.7 and, for that matter, other
future releases.

I tried the following

  • stopping the replica
  • copy contents of the new tarball on top of the old files
  • start the replica again

I expected the replica to then just pick up where it left off and
rejoin the cluster; the master sending any new updates to stuff.

Instead I received a lot of errors on the slave and master

[2011-09-21 09:51:29,604][WARN ][cluster.action.shard ] [dtmc]
sending failed shard for [paste][0], node[v4DssP8vRZW3DpFqHx3Kgg],
[R], s[INITIALIZING], reason [Failed to start shard, message
[RecoveryFailedException[Index Shard [paste][0]: Recovery failed from
[dtmb][-2ebGZhPTYiT-wJbpgC3bg][inet[/192.168.11.14:9300]] into [dtmc]
[v4DssP8vRZW3DpFqHx3Kgg][inet[/192.168.11.66:9300]]]; nested:
RemoteTransportException[[dtmb][inet[/192.168.11.14:9300]][index/shard/
recovery/startRecovery]]; nested: RecoveryEngineException[[paste][0]
Phase[2] Execution failed]; nested: RemoteTransportException[[dtmc]
[inet[/192.168.11.66:9300]][index/shard/recovery/translogOps]];
nested:
NoSuchMethodError[org.apache.lucene.document.Fieldable.setOmitTermFreqAndPositions(Z)V];
]]

and the slave appeared to go into a tight loop of just spitting out
those errors.

So I stopped the slave and am scratching my head trying to figure out
how to restart it.

What is the general recommended way of upgrading the nodes in a
cluster? Can you do them one at a time? Or do you need to bring the
entire cluster down, upgrade them all, and then start them again?

Thanks,
Tim


(caphrim007) #3

Thanks for the information Shay.

On Sep 22, 6:03 am, Shay Banon kim...@gmail.com wrote:

Extracting the installation on the same older installation will not do it
(as it will simply add files to it). You should extract it somewhere else,
and have the new installation pointing to the same data location (or copy
over the old installation data directory to the new installation).

Upgrades between minor releases can be done by doing a rolling upgrade,
though it will cause more shards moving around.On Wed, Sep 21, 2011 at 6:02 PM, caphrim007 caphrim...@gmail.com wrote:

I have a default 5 shard, 1 replica cluster on 0.17.6 and was
wondering how one upgrades to 0.17.7 and, for that matter, other
future releases.

I tried the following

  • stopping the replica
  • copy contents of the new tarball on top of the old files
  • start the replica again

I expected the replica to then just pick up where it left off and
rejoin the cluster; the master sending any new updates to stuff.

Instead I received a lot of errors on the slave and master

[2011-09-21 09:51:29,604][WARN ][cluster.action.shard ] [dtmc]
sending failed shard for [paste][0], node[v4DssP8vRZW3DpFqHx3Kgg],
[R], s[INITIALIZING], reason [Failed to start shard, message
[RecoveryFailedException[Index Shard [paste][0]: Recovery failed from
[dtmb][-2ebGZhPTYiT-wJbpgC3bg][inet[/192.168.11.14:9300]] into [dtmc]
[v4DssP8vRZW3DpFqHx3Kgg][inet[/192.168.11.66:9300]]]; nested:
RemoteTransportException[[dtmb][inet[/192.168.11.14:9300]][index/shard/
recovery/startRecovery]]; nested: RecoveryEngineException[[paste][0]
Phase[2] Execution failed]; nested: RemoteTransportException[[dtmc]
[inet[/192.168.11.66:9300]][index/shard/recovery/translogOps]];
nested:
NoSuchMethodError[org.apache.lucene.document.Fieldable.setOmitTermFreqAndPo sitions(Z)V];
]]

and the slave appeared to go into a tight loop of just spitting out
those errors.

So I stopped the slave and am scratching my head trying to figure out
how to restart it.

What is the general recommended way of upgrading the nodes in a
cluster? Can you do them one at a time? Or do you need to bring the
entire cluster down, upgrade them all, and then start them again?

Thanks,
Tim


(system) #4