Hi. I need to upgrade a simple Elasticsearch 6.5.4 cluster (3 master eligible+data servers) to 6.8.3. The upgrade guide says this can be done without any service interruption. Disabling shard replication and re-enabling it later is still a bit involved. Can I simply upgrade the FreeBSD package in place & restart the server one by one, seeing if each one joins the cluster first? Other steps in the guide are irrelevant for us (machine learning jobs, plugins etc).
You can, but you might find it takes a lot longer. If shard allocation is enabled then the cluster will start rebuilding lost replicas when the node leaves and will then have to discard all that work when the node comes back.
Thanks, but the guide says that the replication starts after the stopped server has been absent for index.unassigned.node_left.delayed_timeout` (by default, one minute) ? The server will surely restart before that.
It is true that there is by default a short delay before reacting to the node leaving. If you are feeling lucky then you can rely on being able to upgrade and restart each node within that delay to avoid any unnecessary shard allocation.
Thanks. Basically upgrading the node doesn't count towards the limit, only restarting it does, thanks to Unix allowing binaries to be unlinked & replaced while they're running. Some, if not most, OS package upgrading utilities rely on that assumption and might only offer restarting a service at the very end (Debian's APT, for example). FreeBSD's pkg(8) just overwrites them in place, it's your job to restart the service later.
That sounds a little risky: Elasticsearch would run into trouble if it were loading libraries dynamically and those libraries changed out from underneath it while it was running.
The time is measured from the node leaving the cluster to the time it rejoins, which can be quite some time after the node process starts.
As I said, however, the worst case is really that the cluster takes longer than expected to settle down again so as long as that's ok you should be fine.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.