Upgrading from 0.19.0 to 0.19.8

VegHead · July 6, 2012, 5:26pm

We have a 10 node cluster out on Amazon EC2 running 0.19.0 and we would
like to upgrade to 0.19.8 without any down time. Given that it's very easy
for us to launch new instances, is the easiest approach to simply add new
nodes running 0.19.8 to the cluster, wait for it to stabilze and then start
removing the 0.19.0 nodes? Are different minor versions still compatible
with each other?

And when it comes to major versions (e.g. 0.19.x to 0.20.x, when it comes
out), we presumably have to maintain two separate clusters and index to
both of them if we want zero down time?

-Sean

drewr · July 6, 2012, 8:00pm

VegHead wrote:

We have a 10 node cluster out on Amazon EC2 running 0.19.0 and we
would like to upgrade to 0.19.8 without any down time. Given that
it's very easy for us to launch new instances, is the easiest
approach to simply add new nodes running 0.19.8 to the cluster,
wait for it to stabilze and then start removing the 0.19.0 nodes?
Are different minor versions still compatible with each other?

There are a few changes in there that would make me nervous. Lucene
& Netty were both upgraded, not to mention the state changes that led
up to the dangling index support in 0.19.8 (although Shay may have
anticipated that in 0.19.0). I don't know if any of that translates
into inter-node communication issues or not which would be the
question for a rolling upgrade.

But if you can't afford any downtime, you shouldn't even begin to
speculate. Copy your cluster as-is (maybe just a subset of nodes)
and try the upgrade along with a battery of tests. It's the only way
you'll know for sure.

If you decide you can afford a little bit of downtime, you can
restart a green cluster with hundreds of large indices in a matter of
minutes, depending on the hardware. How many indices are we talking
about here?

And when it comes to major versions (e.g. 0.19.x to 0.20.x, when it
comes out), we presumably have to maintain two separate clusters
and index to both of them if we want zero down time?

Yes. Either reindex into the new one and swap it out, or stop
indexing on the old and leave it up for searching while you get the
new one up, etc. It sounds like an important enough requirement for
you that you should put some abstraction above a single ES cluster.

-Drew