Es rolling upgrade 1.3.2->1.4.2

Hi

We are performing a rolling upgrade from 1.3.2 to 1.4.2.

We have turned off reallocation.

After upgrading 2 of 3 nodes we are receiving lots of warnings/errors in
the log file:

in node running 1.4.2:

[2015-01-13 12:22:09,902][WARN ][cluster.action.shard ]
[192.168.124.13] [edition][6] received shard failed for [edition][6],
node[FNnZJvgxQ6Or4i_Fe5H2Qw], [R], s[INITIALIZING], indexUUID
[johTS4GqSpOQbU6N5S7sCg], reason [Failed to start shard, message
[RecoveryFailedException[[edition][6]: Recovery failed from [Omega
Red][ckPNnMMdRZuZ5zU0XhrNCA][localhost][inet[/192.168.124.14:9300]]{master=true}
into
[192.168.124.12][FNnZJvgxQ6Or4i_Fe5H2Qw][localhost][inet[192.168.124.12/192.168.124.12:9300]]{master=true}];
nested: RemoteTransportException[[Omega
Red][inet[/192.168.124.14:9300]][index/shard/recovery/startRecovery]];
nested: IllegalArgumentException[No enum constant
org.apache.lucene.util.Version.LUCENE_4_10]; ]]

in node running 1.3.2:

[2015-01-13 12:16:20,152][WARN ][transport.netty ] [Omega Red]
Message not fully read (request) for [2303] and action
[index/shard/recovery/startRecovery], resetting

Are there any known issues when upgrading from 1.3.2 to 1.4.2?

Eventually the cluster became green. Does this mean we haven't lost any
data?

After upgrading node 1 we temporarily turned on reallocation again since it
did not seem to pick up its shards again. Should you turn on reallocation
after upgrading each node or after upgrading all nodes?

Do anyone know what the error messages mean?

/ daniel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7d428779-3860-47af-9daa-54b6bcd0cf96%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

On Tue, Jan 13, 2015 at 7:32 AM, Daniel Jansson daniel.jansson@dn.se
wrote:

Hi

We are performing a rolling upgrade from 1.3.2 to 1.4.2.

We have turned off reallocation.

After upgrading 2 of 3 nodes we are receiving lots of warnings/errors in
the log file:

in node running 1.4.2:

[2015-01-13 12:22:09,902][WARN ][cluster.action.shard ]
[192.168.124.13] [edition][6] received shard failed for [edition][6],
node[FNnZJvgxQ6Or4i_Fe5H2Qw], [R], s[INITIALIZING], indexUUID
[johTS4GqSpOQbU6N5S7sCg], reason [Failed to start shard, message
[RecoveryFailedException[[edition][6]: Recovery failed from [Omega
Red][ckPNnMMdRZuZ5zU0XhrNCA][localhost][inet[/192.168.124.14:9300]]{master=true}
into [192.168.124.12][FNnZJvgxQ6Or4i_Fe5H2Qw][localhost][inet[
192.168.124.12/192.168.124.12:9300]]{master=true}
http://192.168.124.12/192.168.124.12:9300]]{master=true}];
nested: RemoteTransportException[[Omega Red][inet[/192.168.124.14:9300]][index/shard/recovery/startRecovery]];
nested: IllegalArgumentException[No enum constant
org.apache.lucene.util.Version.LUCENE_4_10]; ]]

in node running 1.3.2:

[2015-01-13 12:16:20,152][WARN ][transport.netty ] [Omega Red]
Message not fully read (request) for [2303] and action
[index/shard/recovery/startRecovery], resetting

Are there any known issues when upgrading from 1.3.2 to 1.4.2?

Not that I know of.

Eventually the cluster became green. Does this mean we haven't lost any
data?

Yes. My understanding is that the cluster will go red if it has lost data
even temporarily. In some cases bringing nodes back online will bring the
cluster back from red. If it is stuck red then that is when you've lost
data.

After upgrading node 1 we temporarily turned on reallocation again since
it did not seem to pick up its shards again. Should you turn on
reallocation after upgrading each node or after upgrading all nodes?

Yes.

Do anyone know what the error messages mean?

They look like they are caused by the 1.3.2 node trying to pull data from
1.4.2 nodes. I thought this was something they wouldn't try to do. By
any chance are you using some kind of shared storage or did you downgrade a
node or something like that? If you did the normal upgrade procedure like
you said above then this is probably a bug. But its possible to cause it
by doing other stuff.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3i6UgB4Zieb12_V79csui%3DDQ10wz3KU6kwBaR78SHVMA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks for the quick response Nikolas,

We haven’t got any shared storage and I'm pretty sure we were doing a
rolling restart by the book but to make sure I just want to double check
two things.

  1. The documentation (
    Elasticsearch Platform — Find real-time answers at scale | Elastic)
    states:

Running multiple versions of Elasticsearch in the same cluster for any
length of time beyond that required for an upgrade is not supported, as
shard replication from the more recent version to the previous versions
will not work.

This part is very clear. However later in the document it says that one
should disable cluster reallocation between upgrading each node and
enabling it after each node has booted correctly. This confuses us a bit,
if we enable reallocation in the cluster before we’ve upgraded all the
nodes in the cluster we might, as we understand it, run in to the scenario
where one of the old nodes tries to replicate to an older version. Is this
correct or are we misunderstanding anything?

  1. A second question relates to the upgrading a single node. When we’ve
    upgraded a node what is the best way to check if we can enable the cluster
    replication? (Or in other words, what’s the best way to do a health check
    of a single node before proceeding)
    Best,
    Max

Den tisdag 13 januari 2015 kl. 15:55:01 UTC+1 skrev Nikolas Everett:

On Tue, Jan 13, 2015 at 7:32 AM, Daniel Jansson <daniel....@dn.se
<javascript:>> wrote:

Hi

We are performing a rolling upgrade from 1.3.2 to 1.4.2.

We have turned off reallocation.

After upgrading 2 of 3 nodes we are receiving lots of warnings/errors in
the log file:

in node running 1.4.2:

[2015-01-13 12:22:09,902][WARN ][cluster.action.shard ]
[192.168.124.13] [edition][6] received shard failed for [edition][6],
node[FNnZJvgxQ6Or4i_Fe5H2Qw], [R], s[INITIALIZING], indexUUID
[johTS4GqSpOQbU6N5S7sCg], reason [Failed to start shard, message
[RecoveryFailedException[[edition][6]: Recovery failed from [Omega
Red][ckPNnMMdRZuZ5zU0XhrNCA][localhost][inet[/192.168.124.14:9300]]{master=true}
into [192.168.124.12][FNnZJvgxQ6Or4i_Fe5H2Qw][localhost][inet[
192.168.124.12/192.168.124.12:9300]]{master=true}
http://192.168.124.12/192.168.124.12:9300]]{master=true}];
nested: RemoteTransportException[[Omega
Red][inet[/192.168.124.14:9300]][index/shard/recovery/startRecovery]];
nested: IllegalArgumentException[No enum constant
org.apache.lucene.util.Version.LUCENE_4_10]; ]]

in node running 1.3.2:

[2015-01-13 12:16:20,152][WARN ][transport.netty ] [Omega Red]
Message not fully read (request) for [2303] and action
[index/shard/recovery/startRecovery], resetting

Are there any known issues when upgrading from 1.3.2 to 1.4.2?

Not that I know of.

Eventually the cluster became green. Does this mean we haven't lost any
data?

Yes. My understanding is that the cluster will go red if it has lost data
even temporarily. In some cases bringing nodes back online will bring the
cluster back from red. If it is stuck red then that is when you've lost
data.

After upgrading node 1 we temporarily turned on reallocation again since
it did not seem to pick up its shards again. Should you turn on
reallocation after upgrading each node or after upgrading all nodes?

Yes.

Do anyone know what the error messages mean?

They look like they are caused by the 1.3.2 node trying to pull data from
1.4.2 nodes. I thought this was something they wouldn't try to do. By
any chance are you using some kind of shared storage or did you downgrade a
node or something like that? If you did the normal upgrade procedure like
you said above then this is probably a bug. But its possible to cause it
by doing other stuff.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4a27c106-afd6-4450-a39c-054d2de0e258%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

1 - It's not worth worrying about, you cannot replicate from an older
version to a newer version when going between minor versions (eg 1.3 > 1.4).
2 - If an upgraded node rejoins the cluster, then it's good to go.

On 15 January 2015 at 01:31, Max Charas max.charas@gmail.com wrote:

Thanks for the quick response Nikolas,

We haven’t got any shared storage and I'm pretty sure we were doing a
rolling restart by the book but to make sure I just want to double check
two things.

  1. The documentation (
    Elasticsearch Platform — Find real-time answers at scale | Elastic)
    states:

Running multiple versions of Elasticsearch in the same cluster for any
length of time beyond that required for an upgrade is not supported, as
shard replication from the more recent version to the previous versions
will not work.

This part is very clear. However later in the document it says that one
should disable cluster reallocation between upgrading each node and
enabling it after each node has booted correctly. This confuses us a bit,
if we enable reallocation in the cluster before we’ve upgraded all the
nodes in the cluster we might, as we understand it, run in to the scenario
where one of the old nodes tries to replicate to an older version. Is this
correct or are we misunderstanding anything?

  1. A second question relates to the upgrading a single node. When we’ve
    upgraded a node what is the best way to check if we can enable the cluster
    replication? (Or in other words, what’s the best way to do a health check
    of a single node before proceeding)
    Best,
    Max

Den tisdag 13 januari 2015 kl. 15:55:01 UTC+1 skrev Nikolas Everett:

On Tue, Jan 13, 2015 at 7:32 AM, Daniel Jansson daniel....@dn.se wrote:

Hi

We are performing a rolling upgrade from 1.3.2 to 1.4.2.

We have turned off reallocation.

After upgrading 2 of 3 nodes we are receiving lots of warnings/errors in
the log file:

in node running 1.4.2:

[2015-01-13 12:22:09,902][WARN ][cluster.action.shard ]
[192.168.124.13] [edition][6] received shard failed for [edition][6],
node[FNnZJvgxQ6Or4i_Fe5H2Qw], [R], s[INITIALIZING], indexUUID
[johTS4GqSpOQbU6N5S7sCg], reason [Failed to start shard, message
[RecoveryFailedException[[edition][6]: Recovery failed from [Omega
Red][ckPNnMMdRZuZ5zU0XhrNCA][localhost][inet[/192.168.124.14:9300]]{master=true}
into [192.168.124.12][FNnZJvgxQ6Or4i_Fe5H2Qw][localhost][inet[
192.168.124.12/192.168.124.12:9300]]{master=true}
http://192.168.124.12/192.168.124.12:9300]]{master=true}];
nested: RemoteTransportException[[Omega Red][inet[/192.168.124.14:9300
]][index/shard/recovery/startRecovery]]; nested:
IllegalArgumentException[No enum constant org.apache.lucene.util.Version.LUCENE_4_10];
]]

in node running 1.3.2:

[2015-01-13 12:16:20,152][WARN ][transport.netty ] [Omega Red]
Message not fully read (request) for [2303] and action
[index/shard/recovery/startRecovery], resetting

Are there any known issues when upgrading from 1.3.2 to 1.4.2?

Not that I know of.

Eventually the cluster became green. Does this mean we haven't lost any
data?

Yes. My understanding is that the cluster will go red if it has lost
data even temporarily. In some cases bringing nodes back online will bring
the cluster back from red. If it is stuck red then that is when you've
lost data.

After upgrading node 1 we temporarily turned on reallocation again since
it did not seem to pick up its shards again. Should you turn on
reallocation after upgrading each node or after upgrading all nodes?

Yes.

Do anyone know what the error messages mean?

They look like they are caused by the 1.3.2 node trying to pull data from
1.4.2 nodes. I thought this was something they wouldn't try to do. By
any chance are you using some kind of shared storage or did you downgrade a
node or something like that? If you did the normal upgrade procedure like
you said above then this is probably a bug. But its possible to cause it
by doing other stuff.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4a27c106-afd6-4450-a39c-054d2de0e258%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4a27c106-afd6-4450-a39c-054d2de0e258%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_SU5xxFnXCQL81oe7ddeLZhdH00t86kD4TR%3Di_Eb3TEQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.