1.4.0 data node can't join existing 1.3.4 cluster

Heya,

We will release aws plugin 2.4.1 in some minutes.
It fixes this rolling upgrade issue.

Note that some WARN messages could appear in old nodes logs until the full
rolling upgrade is done.

Thank you all for reporting this!

Le dimanche 23 novembre 2014 03:10:42 UTC+1, Ivan Brusic a écrit :

Great work everyone. Feel better about upgrading now.
On Nov 22, 2014 4:42 PM, "Boaz Leskes" b.leskes@gmail.com wrote:

Hi Christian, Daniel,

I believe I found the issue - it has to do with the cloud plugins (both
AWS and GCE) and the way they create the node list for the unicast based
discovery. Effectively they mislead it to think that that all nodes on the
cluster are version 1.4.0 which is not correct.

I opened issues for this so it will be corrected soon:
AwsEc2UnicastHostsProvider should use version.minimumCompatibilityVersion() · Issue #143 · elastic/elasticsearch-cloud-aws · GitHub ,
UnicastHostsProvider should use version.minimumCompatibilityVersion() · Issue #41 · elastic/elasticsearch-cloud-gce · GitHub

Cheers,
Boaz

On Saturday, November 22, 2014 7:04:33 PM UTC+2, Jörg Prante wrote:

As said, the change is due to unicast action, which was split in 1.4.0
to an old and a new action, see this commit:

Add docs for the include_named_queries_score param (#103155) · elastic/elasticsearch@47b5753 · GitHub
e5de47d928582694c7729d199390086983779e6e
https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fcommit%2Fe5de47d928582694c7729d199390086983779e6e&sa=D&sntz=1&usg=AFQjCNFQkgiVz8SfE_dZ5Sa5K7TqYCIQ6g

I am not sure if this is a bug. It seems like a feature to prevent
multiple masters by accident.

The strategy as described above by Christian Hedegaard should work, it
is still to be considered a work-around:

  • setting up all new 1.4 nodes as not master eligible ("data only")

  • joining them to a 1.3.x cluster while master still is on a 1.3 node
    should work

  • then, shutting down all 1.3 nodes (except the master) should relocate
    the shards

  • bringing down the final 1.3 master should "stall" master election (I
    would also configure a large timeout for master election). This is
    critical, no index/mapping creations/deletions or cluster state modifying
    actions should be executed now.

  • adding a 1.4 master eligible node should now overtake the cluster (I
    would start it with the data folder from the final 1.3 master where the
    last cluster state is persisted) and the critical phase is over.

  • from then, more 1.4 master eligible nodes should be possible to add

  • finally, the minimum master nodes setting should be configured

Jörg

On Fri, Nov 21, 2014 at 1:56 AM, Christian Hedegaard <
chedegaard@red5studios.com> wrote:

FYI, I have found a solution that works (at least for me).

I’ve got a small cluster for testing, only 4 v1.3.5 nodes. What I’ve
done is bring up 4X new v1.4.0 nodes as data-only machines. In the yaml I
added a line to point the nodes via unicast explicitly to the current
master:

discovery.zen.ping.unicast.hosts: ["10.210.9.224:9300"]

When I restarted elasticsearch with that setting, with cloud-aws
installed and configured on version 2.4.0, the new nodes found the cluster
and properly joined it.

I will now start nuking the old v1.3.5 nodes to migrate the data off of
them. Before the final 1.3.5 node is nuked, I will change the config on one
of the v1.4.0 nodes to allow it as master and restart it.

I’m not sure if the master stuff is needed or not, but I was very
afraid of a split-brain problem. I have another 4-node testing cluster that
I will be able to try this upgrade again with in a more controlled manner.

I’m NOT looking forward to upgrading our current production cluster
this way (15 data-only nodes, 3 master-only nodes).

So it would appear that the problem is somewhere in the unicast
discovery code. The question is who’s to blame? Elasticsearch or the
cloud-aws plugin?

From: Boaz Leskes [mailto:b.leskes@gmail.com]
Sent: Wednesday, November 19, 2014 2:27 PM
To: elasticsearch@googlegroups.com
Cc: Christian Hedegaard
Subject: Re: 1.4.0 data node can't join existing 1.3.4 cluster

Hi Christian,

I'm not sure what thread you refer to exactly, but this shouldn't
happen. Can you describe the problem you have some more? Anything in the
nodes? (both the 1.4 node and the master)

Cheers,

Boaz

On Wednesday, November 19, 2014 2:39:57 AM UTC+1, Christian Hedegaard
wrote:

I found this thread while trying to research the same issue and it
looks like there is currently no resolution. We like to keep up on our
elasticsearch upgrades as often as possible and do rolling upgrades to keep
our clusters up. When testing I’m having the same issue, I cannot add a
1.4.0 box to the existing 1.3.4 cluster.

Is there a fix for this anticipated?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/5CF8216AA982AF47A8E6DEACA629D2
2B4EBF409B%40s-us-ex-6.US.R5S.com
https://groups.google.com/d/msgid/elasticsearch/5CF8216AA982AF47A8E6DEACA629D22B4EBF409B%40s-us-ex-6.US.R5S.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0bc369d9-1cd1-47ef-ba14-12fa29f5fd4b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0bc369d9-1cd1-47ef-ba14-12fa29f5fd4b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c6a4d157-8f10-485d-a52d-a6cc192e08ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.