1.4.0 data node can't join existing 1.3.4 cluster

Cool :slight_smile:

Usually this means a fix will emerge. Thanks!

On Friday, November 21, 2014 10:07:03 AM UTC+1, Mark Walkom wrote:

It's being looked at, but I don't know much beyond that at the moment
sorry.

On 21 November 2014 20:02, <madsm...@colourbox.com <javascript:>> wrote:

Is there any of the elasticsearch team members that can hint to whether
or not this is something that will be fixed in 1.4.1? Then we'll simply
wait for it instead of doing different hacks to upgrade.

On Monday, November 17, 2014 12:35:03 PM UTC+1, Matthew Barrington wrote:

I stand corrected, this did not work on our main cluster.

On Monday, 17 November 2014 11:13:22 UTC, Matthew Barrington wrote:

We are running a 1.3.4 cluster using the AWS plugin and I noticed the
same error when I tried to upgrade a single node.

Since I was trying this on my test cluster first I decided to see what
would happen if I upgraded a 2nd node. Would it split into 2 clusters, have
the same issue, etc.

What I discovered was that when 2 nodes were upgraded to 1.4 they
joined the cluster correctly and everything looks to be working.

SO the problem seems to be for the initial node to join, but when you
try with two everything works out.

On Friday, 14 November 2014 18:05:01 UTC, Eric Jain wrote:

On Fri, Nov 14, 2014 at 3:41 AM, madsm...@colourbox.com wrote:

I'm also seing this problem when a 1.4.0 node tries joining a 1.3.4
cluster
with cloud-aws plugin version 2.4.0. Is there a workaround to use
during
upgrade, since I assume it's not a problem when they're all upgraded
to
1.4.0.

I ended up starting a new cluster (ignoring all the warnings logged on
startup), and restoring from a snapshot. Once all the 1.3.4 nodes were
gone, no issues.

--
Eric Jain
Got data? Get answers at zenobase.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/312dcdc1-d826-4cb9-b480-620232634ea7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/312dcdc1-d826-4cb9-b480-620232634ea7%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8decfa8a-7583-41ad-ba0f-f7982e49b73d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Has an official issue been created? I would like to track the status.

So far, every 1.x.0 release has been buggy. :slight_smile:

--
Ivan

On Fri, Nov 21, 2014 at 4:06 AM, Mark Walkom markwalkom@gmail.com wrote:

It's being looked at, but I don't know much beyond that at the moment
sorry.

On 21 November 2014 20:02, madsmartin@colourbox.com wrote:

Is there any of the elasticsearch team members that can hint to whether
or not this is something that will be fixed in 1.4.1? Then we'll simply
wait for it instead of doing different hacks to upgrade.

On Monday, November 17, 2014 12:35:03 PM UTC+1, Matthew Barrington wrote:

I stand corrected, this did not work on our main cluster.

On Monday, 17 November 2014 11:13:22 UTC, Matthew Barrington wrote:

We are running a 1.3.4 cluster using the AWS plugin and I noticed the
same error when I tried to upgrade a single node.

Since I was trying this on my test cluster first I decided to see what
would happen if I upgraded a 2nd node. Would it split into 2 clusters, have
the same issue, etc.

What I discovered was that when 2 nodes were upgraded to 1.4 they
joined the cluster correctly and everything looks to be working.

SO the problem seems to be for the initial node to join, but when you
try with two everything works out.

On Friday, 14 November 2014 18:05:01 UTC, Eric Jain wrote:

On Fri, Nov 14, 2014 at 3:41 AM, madsm...@colourbox.com wrote:

I'm also seing this problem when a 1.4.0 node tries joining a 1.3.4
cluster
with cloud-aws plugin version 2.4.0. Is there a workaround to use
during
upgrade, since I assume it's not a problem when they're all upgraded
to
1.4.0.

I ended up starting a new cluster (ignoring all the warnings logged on
startup), and restoring from a snapshot. Once all the 1.3.4 nodes were
gone, no issues.

--
Eric Jain
Got data? Get answers at zenobase.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/312dcdc1-d826-4cb9-b480-620232634ea7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/312dcdc1-d826-4cb9-b480-620232634ea7%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZn-ryDDoQps-smzUPkJd5ru9EHfKuAGRReU2-J-C35kvA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZn-ryDDoQps-smzUPkJd5ru9EHfKuAGRReU2-J-C35kvA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDZ8k8GQQJn89V4W4S1Bm%3DDKfgMBsaB6a%2B4TcFr67nJkg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

As said, the change is due to unicast action, which was split in 1.4.0 to
an old and a new action, see this commit:

https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fcommit%2Fe5de47d928582694c7729d199390086983779e6e&sa=D&sntz=1&usg=AFQjCNFQkgiVz8SfE_dZ5Sa5K7TqYCIQ6g

I am not sure if this is a bug. It seems like a feature to prevent multiple
masters by accident.

The strategy as described above by Christian Hedegaard should work, it is
still to be considered a work-around:

  • setting up all new 1.4 nodes as not master eligible ("data only")

  • joining them to a 1.3.x cluster while master still is on a 1.3 node
    should work

  • then, shutting down all 1.3 nodes (except the master) should relocate the
    shards

  • bringing down the final 1.3 master should "stall" master election (I
    would also configure a large timeout for master election). This is
    critical, no index/mapping creations/deletions or cluster state modifying
    actions should be executed now.

  • adding a 1.4 master eligible node should now overtake the cluster (I
    would start it with the data folder from the final 1.3 master where the
    last cluster state is persisted) and the critical phase is over.

  • from then, more 1.4 master eligible nodes should be possible to add

  • finally, the minimum master nodes setting should be configured

Jörg

On Fri, Nov 21, 2014 at 1:56 AM, Christian Hedegaard <
chedegaard@red5studios.com> wrote:

FYI, I have found a solution that works (at least for me).

I’ve got a small cluster for testing, only 4 v1.3.5 nodes. What I’ve done
is bring up 4X new v1.4.0 nodes as data-only machines. In the yaml I added
a line to point the nodes via unicast explicitly to the current master:

discovery.zen.ping.unicast.hosts: ["10.210.9.224:9300"]

When I restarted elasticsearch with that setting, with cloud-aws installed
and configured on version 2.4.0, the new nodes found the cluster and
properly joined it.

I will now start nuking the old v1.3.5 nodes to migrate the data off of
them. Before the final 1.3.5 node is nuked, I will change the config on one
of the v1.4.0 nodes to allow it as master and restart it.

I’m not sure if the master stuff is needed or not, but I was very afraid
of a split-brain problem. I have another 4-node testing cluster that I will
be able to try this upgrade again with in a more controlled manner.

I’m NOT looking forward to upgrading our current production cluster this
way (15 data-only nodes, 3 master-only nodes).

So it would appear that the problem is somewhere in the unicast discovery
code. The question is who’s to blame? Elasticsearch or the cloud-aws plugin?

From: Boaz Leskes [mailto:b.leskes@gmail.com]
Sent: Wednesday, November 19, 2014 2:27 PM
To: elasticsearch@googlegroups.com
Cc: Christian Hedegaard
Subject: Re: 1.4.0 data node can't join existing 1.3.4 cluster

Hi Christian,

I'm not sure what thread you refer to exactly, but this shouldn't happen.
Can you describe the problem you have some more? Anything in the nodes?
(both the 1.4 node and the master)

Cheers,

Boaz

On Wednesday, November 19, 2014 2:39:57 AM UTC+1, Christian Hedegaard
wrote:

I found this thread while trying to research the same issue and it looks
like there is currently no resolution. We like to keep up on our
elasticsearch upgrades as often as possible and do rolling upgrades to keep
our clusters up. When testing I’m having the same issue, I cannot add a
1.4.0 box to the existing 1.3.4 cluster.

Is there a fix for this anticipated?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5CF8216AA982AF47A8E6DEACA629D22B4EBF409B%40s-us-ex-6.US.R5S.com
https://groups.google.com/d/msgid/elasticsearch/5CF8216AA982AF47A8E6DEACA629D22B4EBF409B%40s-us-ex-6.US.R5S.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFYYBDQBXVhr-egAspoY_sy6PqW1wBh%3DPAEGM%2B1%2Bab%2Bew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi All,

I believe I found the source of the problem and it has to do with the AWS
plugin. I opened an issue for it, which should be pretty easy to
fix: AwsEc2UnicastHostsProvider should use version.minimumCompatibilityVersion() · Issue #143 · elastic/elasticsearch-cloud-aws · GitHub .

Cheers,
Boaz

On Friday, November 21, 2014 5:39:32 PM UTC+2, Ivan Brusic wrote:

Has an official issue been created? I would like to track the status.

So far, every 1.x.0 release has been buggy. :slight_smile:

--
Ivan

On Fri, Nov 21, 2014 at 4:06 AM, Mark Walkom markwalkom@gmail.com wrote:

It's being looked at, but I don't know much beyond that at the moment
sorry.

On 21 November 2014 20:02, madsmartin@colourbox.com wrote:

Is there any of the elasticsearch team members that can hint to whether
or not this is something that will be fixed in 1.4.1? Then we'll simply
wait for it instead of doing different hacks to upgrade.

On Monday, November 17, 2014 12:35:03 PM UTC+1, Matthew Barrington wrote:

I stand corrected, this did not work on our main cluster.

On Monday, 17 November 2014 11:13:22 UTC, Matthew Barrington wrote:

We are running a 1.3.4 cluster using the AWS plugin and I noticed the
same error when I tried to upgrade a single node.

Since I was trying this on my test cluster first I decided to see what
would happen if I upgraded a 2nd node. Would it split into 2 clusters, have
the same issue, etc.

What I discovered was that when 2 nodes were upgraded to 1.4 they
joined the cluster correctly and everything looks to be working.

SO the problem seems to be for the initial node to join, but when you
try with two everything works out.

On Friday, 14 November 2014 18:05:01 UTC, Eric Jain wrote:

On Fri, Nov 14, 2014 at 3:41 AM, madsm...@colourbox.com wrote:

I'm also seing this problem when a 1.4.0 node tries joining a 1.3.4
cluster
with cloud-aws plugin version 2.4.0. Is there a workaround to use
during
upgrade, since I assume it's not a problem when they're all
upgraded to
1.4.0.

I ended up starting a new cluster (ignoring all the warnings logged
on
startup), and restoring from a snapshot. Once all the 1.3.4 nodes
were
gone, no issues.

--
Eric Jain
Got data? Get answers at zenobase.com.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/312dcdc1-d826-4cb9-b480-620232634ea7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/312dcdc1-d826-4cb9-b480-620232634ea7%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZn-ryDDoQps-smzUPkJd5ru9EHfKuAGRReU2-J-C35kvA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZn-ryDDoQps-smzUPkJd5ru9EHfKuAGRReU2-J-C35kvA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fd080846-ce10-4e9d-885a-8ad406e03944%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Christian, Daniel,

I believe I found the issue - it has to do with the cloud plugins (both AWS
and GCE) and the way they create the node list for the unicast based
discovery. Effectively they mislead it to think that that all nodes on the
cluster are version 1.4.0 which is not correct.

I opened issues for this so it will be corrected
soon: AwsEc2UnicastHostsProvider should use version.minimumCompatibilityVersion() · Issue #143 · elastic/elasticsearch-cloud-aws · GitHub
, UnicastHostsProvider should use version.minimumCompatibilityVersion() · Issue #41 · elastic/elasticsearch-cloud-gce · GitHub

Cheers,
Boaz

On Saturday, November 22, 2014 7:04:33 PM UTC+2, Jörg Prante wrote:

As said, the change is due to unicast action, which was split in 1.4.0 to
an old and a new action, see this commit:

Discovery: back port #7558 to 1.x and add bwc protections of the new … · elastic/elasticsearch@e5de47d · GitHub
https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fcommit%2Fe5de47d928582694c7729d199390086983779e6e&sa=D&sntz=1&usg=AFQjCNFQkgiVz8SfE_dZ5Sa5K7TqYCIQ6g

I am not sure if this is a bug. It seems like a feature to prevent
multiple masters by accident.

The strategy as described above by Christian Hedegaard should work, it is
still to be considered a work-around:

  • setting up all new 1.4 nodes as not master eligible ("data only")

  • joining them to a 1.3.x cluster while master still is on a 1.3 node
    should work

  • then, shutting down all 1.3 nodes (except the master) should relocate
    the shards

  • bringing down the final 1.3 master should "stall" master election (I
    would also configure a large timeout for master election). This is
    critical, no index/mapping creations/deletions or cluster state modifying
    actions should be executed now.

  • adding a 1.4 master eligible node should now overtake the cluster (I
    would start it with the data folder from the final 1.3 master where the
    last cluster state is persisted) and the critical phase is over.

  • from then, more 1.4 master eligible nodes should be possible to add

  • finally, the minimum master nodes setting should be configured

Jörg

On Fri, Nov 21, 2014 at 1:56 AM, Christian Hedegaard <
chedegaard@red5studios.com> wrote:

FYI, I have found a solution that works (at least for me).

I’ve got a small cluster for testing, only 4 v1.3.5 nodes. What I’ve done
is bring up 4X new v1.4.0 nodes as data-only machines. In the yaml I added
a line to point the nodes via unicast explicitly to the current master:

discovery.zen.ping.unicast.hosts: ["10.210.9.224:9300"]

When I restarted elasticsearch with that setting, with cloud-aws
installed and configured on version 2.4.0, the new nodes found the cluster
and properly joined it.

I will now start nuking the old v1.3.5 nodes to migrate the data off of
them. Before the final 1.3.5 node is nuked, I will change the config on one
of the v1.4.0 nodes to allow it as master and restart it.

I’m not sure if the master stuff is needed or not, but I was very afraid
of a split-brain problem. I have another 4-node testing cluster that I will
be able to try this upgrade again with in a more controlled manner.

I’m NOT looking forward to upgrading our current production cluster this
way (15 data-only nodes, 3 master-only nodes).

So it would appear that the problem is somewhere in the unicast discovery
code. The question is who’s to blame? Elasticsearch or the cloud-aws plugin?

From: Boaz Leskes [mailto:b.leskes@gmail.com]
Sent: Wednesday, November 19, 2014 2:27 PM
To: elasticsearch@googlegroups.com
Cc: Christian Hedegaard
Subject: Re: 1.4.0 data node can't join existing 1.3.4 cluster

Hi Christian,

I'm not sure what thread you refer to exactly, but this shouldn't happen.
Can you describe the problem you have some more? Anything in the nodes?
(both the 1.4 node and the master)

Cheers,

Boaz

On Wednesday, November 19, 2014 2:39:57 AM UTC+1, Christian Hedegaard
wrote:

I found this thread while trying to research the same issue and it looks
like there is currently no resolution. We like to keep up on our
elasticsearch upgrades as often as possible and do rolling upgrades to keep
our clusters up. When testing I’m having the same issue, I cannot add a
1.4.0 box to the existing 1.3.4 cluster.

Is there a fix for this anticipated?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5CF8216AA982AF47A8E6DEACA629D22B4EBF409B%40s-us-ex-6.US.R5S.com
https://groups.google.com/d/msgid/elasticsearch/5CF8216AA982AF47A8E6DEACA629D22B4EBF409B%40s-us-ex-6.US.R5S.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0bc369d9-1cd1-47ef-ba14-12fa29f5fd4b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Great work everyone. Feel better about upgrading now.
On Nov 22, 2014 4:42 PM, "Boaz Leskes" b.leskes@gmail.com wrote:

Hi Christian, Daniel,

I believe I found the issue - it has to do with the cloud plugins (both
AWS and GCE) and the way they create the node list for the unicast based
discovery. Effectively they mislead it to think that that all nodes on the
cluster are version 1.4.0 which is not correct.

I opened issues for this so it will be corrected soon:
AwsEc2UnicastHostsProvider should use version.minimumCompatibilityVersion() · Issue #143 · elastic/elasticsearch-cloud-aws · GitHub ,
UnicastHostsProvider should use version.minimumCompatibilityVersion() · Issue #41 · elastic/elasticsearch-cloud-gce · GitHub

Cheers,
Boaz

On Saturday, November 22, 2014 7:04:33 PM UTC+2, Jörg Prante wrote:

As said, the change is due to unicast action, which was split in 1.4.0 to
an old and a new action, see this commit:

Add docs for the include_named_queries_score param (#103155) · elastic/elasticsearch@47b5753 · GitHub
e5de47d928582694c7729d199390086983779e6e
https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fcommit%2Fe5de47d928582694c7729d199390086983779e6e&sa=D&sntz=1&usg=AFQjCNFQkgiVz8SfE_dZ5Sa5K7TqYCIQ6g

I am not sure if this is a bug. It seems like a feature to prevent
multiple masters by accident.

The strategy as described above by Christian Hedegaard should work, it is
still to be considered a work-around:

  • setting up all new 1.4 nodes as not master eligible ("data only")

  • joining them to a 1.3.x cluster while master still is on a 1.3 node
    should work

  • then, shutting down all 1.3 nodes (except the master) should relocate
    the shards

  • bringing down the final 1.3 master should "stall" master election (I
    would also configure a large timeout for master election). This is
    critical, no index/mapping creations/deletions or cluster state modifying
    actions should be executed now.

  • adding a 1.4 master eligible node should now overtake the cluster (I
    would start it with the data folder from the final 1.3 master where the
    last cluster state is persisted) and the critical phase is over.

  • from then, more 1.4 master eligible nodes should be possible to add

  • finally, the minimum master nodes setting should be configured

Jörg

On Fri, Nov 21, 2014 at 1:56 AM, Christian Hedegaard <
chedegaard@red5studios.com> wrote:

FYI, I have found a solution that works (at least for me).

I’ve got a small cluster for testing, only 4 v1.3.5 nodes. What I’ve
done is bring up 4X new v1.4.0 nodes as data-only machines. In the yaml I
added a line to point the nodes via unicast explicitly to the current
master:

discovery.zen.ping.unicast.hosts: ["10.210.9.224:9300"]

When I restarted elasticsearch with that setting, with cloud-aws
installed and configured on version 2.4.0, the new nodes found the cluster
and properly joined it.

I will now start nuking the old v1.3.5 nodes to migrate the data off of
them. Before the final 1.3.5 node is nuked, I will change the config on one
of the v1.4.0 nodes to allow it as master and restart it.

I’m not sure if the master stuff is needed or not, but I was very afraid
of a split-brain problem. I have another 4-node testing cluster that I will
be able to try this upgrade again with in a more controlled manner.

I’m NOT looking forward to upgrading our current production cluster this
way (15 data-only nodes, 3 master-only nodes).

So it would appear that the problem is somewhere in the unicast
discovery code. The question is who’s to blame? Elasticsearch or the
cloud-aws plugin?

From: Boaz Leskes [mailto:b.leskes@gmail.com]
Sent: Wednesday, November 19, 2014 2:27 PM
To: elasticsearch@googlegroups.com
Cc: Christian Hedegaard
Subject: Re: 1.4.0 data node can't join existing 1.3.4 cluster

Hi Christian,

I'm not sure what thread you refer to exactly, but this shouldn't
happen. Can you describe the problem you have some more? Anything in the
nodes? (both the 1.4 node and the master)

Cheers,

Boaz

On Wednesday, November 19, 2014 2:39:57 AM UTC+1, Christian Hedegaard
wrote:

I found this thread while trying to research the same issue and it looks
like there is currently no resolution. We like to keep up on our
elasticsearch upgrades as often as possible and do rolling upgrades to keep
our clusters up. When testing I’m having the same issue, I cannot add a
1.4.0 box to the existing 1.3.4 cluster.

Is there a fix for this anticipated?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/5CF8216AA982AF47A8E6DEACA629D2
2B4EBF409B%40s-us-ex-6.US.R5S.com
https://groups.google.com/d/msgid/elasticsearch/5CF8216AA982AF47A8E6DEACA629D22B4EBF409B%40s-us-ex-6.US.R5S.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0bc369d9-1cd1-47ef-ba14-12fa29f5fd4b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0bc369d9-1cd1-47ef-ba14-12fa29f5fd4b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB3NEWi_-K37C_Hu5QTLza9HdbEBcsSPOAWGFtz2MD50Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Heya,

We will release aws plugin 2.4.1 in some minutes.
It fixes this rolling upgrade issue.

Note that some WARN messages could appear in old nodes logs until the full
rolling upgrade is done.

Thank you all for reporting this!

Le dimanche 23 novembre 2014 03:10:42 UTC+1, Ivan Brusic a écrit :

Great work everyone. Feel better about upgrading now.
On Nov 22, 2014 4:42 PM, "Boaz Leskes" b.leskes@gmail.com wrote:

Hi Christian, Daniel,

I believe I found the issue - it has to do with the cloud plugins (both
AWS and GCE) and the way they create the node list for the unicast based
discovery. Effectively they mislead it to think that that all nodes on the
cluster are version 1.4.0 which is not correct.

I opened issues for this so it will be corrected soon:
AwsEc2UnicastHostsProvider should use version.minimumCompatibilityVersion() · Issue #143 · elastic/elasticsearch-cloud-aws · GitHub ,
UnicastHostsProvider should use version.minimumCompatibilityVersion() · Issue #41 · elastic/elasticsearch-cloud-gce · GitHub

Cheers,
Boaz

On Saturday, November 22, 2014 7:04:33 PM UTC+2, Jörg Prante wrote:

As said, the change is due to unicast action, which was split in 1.4.0
to an old and a new action, see this commit:

Add docs for the include_named_queries_score param (#103155) · elastic/elasticsearch@47b5753 · GitHub
e5de47d928582694c7729d199390086983779e6e
https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fcommit%2Fe5de47d928582694c7729d199390086983779e6e&sa=D&sntz=1&usg=AFQjCNFQkgiVz8SfE_dZ5Sa5K7TqYCIQ6g

I am not sure if this is a bug. It seems like a feature to prevent
multiple masters by accident.

The strategy as described above by Christian Hedegaard should work, it
is still to be considered a work-around:

  • setting up all new 1.4 nodes as not master eligible ("data only")

  • joining them to a 1.3.x cluster while master still is on a 1.3 node
    should work

  • then, shutting down all 1.3 nodes (except the master) should relocate
    the shards

  • bringing down the final 1.3 master should "stall" master election (I
    would also configure a large timeout for master election). This is
    critical, no index/mapping creations/deletions or cluster state modifying
    actions should be executed now.

  • adding a 1.4 master eligible node should now overtake the cluster (I
    would start it with the data folder from the final 1.3 master where the
    last cluster state is persisted) and the critical phase is over.

  • from then, more 1.4 master eligible nodes should be possible to add

  • finally, the minimum master nodes setting should be configured

Jörg

On Fri, Nov 21, 2014 at 1:56 AM, Christian Hedegaard <
chedegaard@red5studios.com> wrote:

FYI, I have found a solution that works (at least for me).

I’ve got a small cluster for testing, only 4 v1.3.5 nodes. What I’ve
done is bring up 4X new v1.4.0 nodes as data-only machines. In the yaml I
added a line to point the nodes via unicast explicitly to the current
master:

discovery.zen.ping.unicast.hosts: ["10.210.9.224:9300"]

When I restarted elasticsearch with that setting, with cloud-aws
installed and configured on version 2.4.0, the new nodes found the cluster
and properly joined it.

I will now start nuking the old v1.3.5 nodes to migrate the data off of
them. Before the final 1.3.5 node is nuked, I will change the config on one
of the v1.4.0 nodes to allow it as master and restart it.

I’m not sure if the master stuff is needed or not, but I was very
afraid of a split-brain problem. I have another 4-node testing cluster that
I will be able to try this upgrade again with in a more controlled manner.

I’m NOT looking forward to upgrading our current production cluster
this way (15 data-only nodes, 3 master-only nodes).

So it would appear that the problem is somewhere in the unicast
discovery code. The question is who’s to blame? Elasticsearch or the
cloud-aws plugin?

From: Boaz Leskes [mailto:b.leskes@gmail.com]
Sent: Wednesday, November 19, 2014 2:27 PM
To: elasticsearch@googlegroups.com
Cc: Christian Hedegaard
Subject: Re: 1.4.0 data node can't join existing 1.3.4 cluster

Hi Christian,

I'm not sure what thread you refer to exactly, but this shouldn't
happen. Can you describe the problem you have some more? Anything in the
nodes? (both the 1.4 node and the master)

Cheers,

Boaz

On Wednesday, November 19, 2014 2:39:57 AM UTC+1, Christian Hedegaard
wrote:

I found this thread while trying to research the same issue and it
looks like there is currently no resolution. We like to keep up on our
elasticsearch upgrades as often as possible and do rolling upgrades to keep
our clusters up. When testing I’m having the same issue, I cannot add a
1.4.0 box to the existing 1.3.4 cluster.

Is there a fix for this anticipated?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/5CF8216AA982AF47A8E6DEACA629D2
2B4EBF409B%40s-us-ex-6.US.R5S.com
https://groups.google.com/d/msgid/elasticsearch/5CF8216AA982AF47A8E6DEACA629D22B4EBF409B%40s-us-ex-6.US.R5S.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0bc369d9-1cd1-47ef-ba14-12fa29f5fd4b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0bc369d9-1cd1-47ef-ba14-12fa29f5fd4b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c6a4d157-8f10-485d-a52d-a6cc192e08ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Awesome! I’ll monitor the cloud-aws plugins github. Once they get a fix I can test it out on another one of our testing clusters.

From: elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] On Behalf Of Boaz Leskes
Sent: Saturday, November 22, 2014 1:43 PM
To: elasticsearch@googlegroups.com
Subject: Re: 1.4.0 data node can't join existing 1.3.4 cluster

Hi Christian, Daniel,

I believe I found the issue - it has to do with the cloud plugins (both AWS and GCE) and the way they create the node list for the unicast based discovery. Effectively they mislead it to think that that all nodes on the cluster are version 1.4.0 which is not correct.

I opened issues for this so it will be corrected soon: https://github.com/elasticsearch/elasticsearch-cloud-aws/issues/143 , https://github.com/elasticsearch/elasticsearch-cloud-gce/issues/41

Cheers,
Boaz

On Saturday, November 22, 2014 7:04:33 PM UTC+2, Jörg Prante wrote:
As said, the change is due to unicast action, which was split in 1.4.0 to an old and a new action, see this commit:

https://github.com/elasticsearch/elasticsearch/commit/e5de47d928582694c7729d199390086983779e6ehttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fcommit%2Fe5de47d928582694c7729d199390086983779e6e&sa=D&sntz=1&usg=AFQjCNFQkgiVz8SfE_dZ5Sa5K7TqYCIQ6g

I am not sure if this is a bug. It seems like a feature to prevent multiple masters by accident.

The strategy as described above by Christian Hedegaard should work, it is still to be considered a work-around:

  • setting up all new 1.4 nodes as not master eligible ("data only")

  • joining them to a 1.3.x cluster while master still is on a 1.3 node should work

  • then, shutting down all 1.3 nodes (except the master) should relocate the shards

  • bringing down the final 1.3 master should "stall" master election (I would also configure a large timeout for master election). This is critical, no index/mapping creations/deletions or cluster state modifying actions should be executed now.

  • adding a 1.4 master eligible node should now overtake the cluster (I would start it with the data folder from the final 1.3 master where the last cluster state is persisted) and the critical phase is over.

  • from then, more 1.4 master eligible nodes should be possible to add

  • finally, the minimum master nodes setting should be configured

Jörg

On Fri, Nov 21, 2014 at 1:56 AM, Christian Hedegaard <chedegaard@red5studios.commailto:chedegaard@red5studios.com> wrote:
FYI, I have found a solution that works (at least for me).

I’ve got a small cluster for testing, only 4 v1.3.5 nodes. What I’ve done is bring up 4X new v1.4.0 nodes as data-only machines. In the yaml I added a line to point the nodes via unicast explicitly to the current master:
discovery.zen.ping.unicast.hosts: ["10.210.9.224:9300http://10.210.9.224:9300"]

When I restarted elasticsearch with that setting, with cloud-aws installed and configured on version 2.4.0, the new nodes found the cluster and properly joined it.

I will now start nuking the old v1.3.5 nodes to migrate the data off of them. Before the final 1.3.5 node is nuked, I will change the config on one of the v1.4.0 nodes to allow it as master and restart it.

I’m not sure if the master stuff is needed or not, but I was very afraid of a split-brain problem. I have another 4-node testing cluster that I will be able to try this upgrade again with in a more controlled manner.

I’m NOT looking forward to upgrading our current production cluster this way (15 data-only nodes, 3 master-only nodes).

So it would appear that the problem is somewhere in the unicast discovery code. The question is who’s to blame? Elasticsearch or the cloud-aws plugin?

From: Boaz Leskes [mailto:b.leskes@gmail.commailto:b.leskes@gmail.com]
Sent: Wednesday, November 19, 2014 2:27 PM
To: elasticsearch@googlegroups.commailto:elasticsearch@googlegroups.com
Cc: Christian Hedegaard
Subject: Re: 1.4.0 data node can't join existing 1.3.4 cluster

Hi Christian,

I'm not sure what thread you refer to exactly, but this shouldn't happen. Can you describe the problem you have some more? Anything in the nodes? (both the 1.4 node and the master)

Cheers,
Boaz

On Wednesday, November 19, 2014 2:39:57 AM UTC+1, Christian Hedegaard wrote:
I found this thread while trying to research the same issue and it looks like there is currently no resolution. We like to keep up on our elasticsearch upgrades as often as possible and do rolling upgrades to keep our clusters up. When testing I’m having the same issue, I cannot add a 1.4.0 box to the existing 1.3.4 cluster.

Is there a fix for this anticipated?

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.commailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5CF8216AA982AF47A8E6DEACA629D22B4EBF409B%40s-us-ex-6.US.R5S.comhttps://groups.google.com/d/msgid/elasticsearch/5CF8216AA982AF47A8E6DEACA629D22B4EBF409B%40s-us-ex-6.US.R5S.com?utm_medium=email&utm_source=footer.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.commailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0bc369d9-1cd1-47ef-ba14-12fa29f5fd4b%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0bc369d9-1cd1-47ef-ba14-12fa29f5fd4b%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5CF8216AA982AF47A8E6DEACA629D22B4EFC28AE%40s-us-ex-6.US.R5S.com.
For more options, visit https://groups.google.com/d/optout.

This is working perfectly! I’ve got a test cluster that I’m in the middle of doing a rolling restart of with no issues:

elasticsearch- "number" : "1.4.0",
elasticsearch- "number" : "1.4.0",
elasticsearch- "number" : "1.3.4",
elasticsearch- "number" : "1.3.4",
elasticsearch- "number" : "1.3.4",

_cluster/health:

"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 5,
"number_of_data_nodes" : 5,
"active_primary_shards" : 1730,
"active_shards" : 3460,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0

I anticipate no other problems finishing this rolling upgrade. Thanks a ton everyone!

From: elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] On Behalf Of David Pilato
Sent: Monday, November 24, 2014 8:31 AM
To: elasticsearch@googlegroups.com
Subject: Re: 1.4.0 data node can't join existing 1.3.4 cluster

Heya,

We will release aws plugin 2.4.1 in some minutes.
It fixes this rolling upgrade issue.

Note that some WARN messages could appear in old nodes logs until the full rolling upgrade is done.

Thank you all for reporting this!

Le dimanche 23 novembre 2014 03:10:42 UTC+1, Ivan Brusic a écrit :

Great work everyone. Feel better about upgrading now.
On Nov 22, 2014 4:42 PM, "Boaz Leskes" <b.leskes@gmail.commailto:b.leskes@gmail.com> wrote:
Hi Christian, Daniel,

I believe I found the issue - it has to do with the cloud plugins (both AWS and GCE) and the way they create the node list for the unicast based discovery. Effectively they mislead it to think that that all nodes on the cluster are version 1.4.0 which is not correct.

I opened issues for this so it will be corrected soon: https://github.com/elasticsearch/elasticsearch-cloud-aws/issues/143 , https://github.com/elasticsearch/elasticsearch-cloud-gce/issues/41

Cheers,
Boaz

On Saturday, November 22, 2014 7:04:33 PM UTC+2, Jörg Prante wrote:
As said, the change is due to unicast action, which was split in 1.4.0 to an old and a new action, see this commit:

https://github.com/elasticsearch/elasticsearch/commit/e5de47d928582694c7729d199390086983779e6ehttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fcommit%2Fe5de47d928582694c7729d199390086983779e6e&sa=D&sntz=1&usg=AFQjCNFQkgiVz8SfE_dZ5Sa5K7TqYCIQ6g

I am not sure if this is a bug. It seems like a feature to prevent multiple masters by accident.

The strategy as described above by Christian Hedegaard should work, it is still to be considered a work-around:

  • setting up all new 1.4 nodes as not master eligible ("data only")

  • joining them to a 1.3.x cluster while master still is on a 1.3 node should work

  • then, shutting down all 1.3 nodes (except the master) should relocate the shards

  • bringing down the final 1.3 master should "stall" master election (I would also configure a large timeout for master election). This is critical, no index/mapping creations/deletions or cluster state modifying actions should be executed now.

  • adding a 1.4 master eligible node should now overtake the cluster (I would start it with the data folder from the final 1.3 master where the last cluster state is persisted) and the critical phase is over.

  • from then, more 1.4 master eligible nodes should be possible to add

  • finally, the minimum master nodes setting should be configured

Jörg

On Fri, Nov 21, 2014 at 1:56 AM, Christian Hedegaard <chedegaard@red5studios.commailto:chedegaard@red5studios.com> wrote:
FYI, I have found a solution that works (at least for me).

I’ve got a small cluster for testing, only 4 v1.3.5 nodes. What I’ve done is bring up 4X new v1.4.0 nodes as data-only machines. In the yaml I added a line to point the nodes via unicast explicitly to the current master:
discovery.zen.ping.unicast.hosts: ["10.210.9.224:9300http://10.210.9.224:9300"]

When I restarted elasticsearch with that setting, with cloud-aws installed and configured on version 2.4.0, the new nodes found the cluster and properly joined it.

I will now start nuking the old v1.3.5 nodes to migrate the data off of them. Before the final 1.3.5 node is nuked, I will change the config on one of the v1.4.0 nodes to allow it as master and restart it.

I’m not sure if the master stuff is needed or not, but I was very afraid of a split-brain problem. I have another 4-node testing cluster that I will be able to try this upgrade again with in a more controlled manner.

I’m NOT looking forward to upgrading our current production cluster this way (15 data-only nodes, 3 master-only nodes).

So it would appear that the problem is somewhere in the unicast discovery code. The question is who’s to blame? Elasticsearch or the cloud-aws plugin?

From: Boaz Leskes [mailto:b.leskes@gmail.commailto:b.leskes@gmail.com]
Sent: Wednesday, November 19, 2014 2:27 PM
To: elasticsearch@googlegroups.commailto:elasticsearch@googlegroups.com
Cc: Christian Hedegaard
Subject: Re: 1.4.0 data node can't join existing 1.3.4 cluster

Hi Christian,

I'm not sure what thread you refer to exactly, but this shouldn't happen. Can you describe the problem you have some more? Anything in the nodes? (both the 1.4 node and the master)

Cheers,
Boaz

On Wednesday, November 19, 2014 2:39:57 AM UTC+1, Christian Hedegaard wrote:
I found this thread while trying to research the same issue and it looks like there is currently no resolution. We like to keep up on our elasticsearch upgrades as often as possible and do rolling upgrades to keep our clusters up. When testing I’m having the same issue, I cannot add a 1.4.0 box to the existing 1.3.4 cluster.

Is there a fix for this anticipated?

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.commailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5CF8216AA982AF47A8E6DEACA629D22B4EBF409B%40s-us-ex-6.US.R5S.comhttps://groups.google.com/d/msgid/elasticsearch/5CF8216AA982AF47A8E6DEACA629D22B4EBF409B%40s-us-ex-6.US.R5S.com?utm_medium=email&utm_source=footer.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.commailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0bc369d9-1cd1-47ef-ba14-12fa29f5fd4b%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0bc369d9-1cd1-47ef-ba14-12fa29f5fd4b%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.commailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c6a4d157-8f10-485d-a52d-a6cc192e08ef%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/c6a4d157-8f10-485d-a52d-a6cc192e08ef%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5CF8216AA982AF47A8E6DEACA629D22B4EFC5EA8%40s-us-ex-6.US.R5S.com.
For more options, visit https://groups.google.com/d/optout.