We have many independent Elasticsearch clusters running on version 2.x and we are currently looking into ways to perform automated updates of these clusters to version 5.x.
Our deployments are managed with Bosh (BOSH is an open source tool for release engineering, deployment, lifecycle management, and monitoring of distributed systems.). With Bosh, new releases are rolled out in an automated fashion, using a canary deployment as strategy. This means, if we update an Elasticsearch cluster from version 2.4 to 5.0, the first node gets updated to version 5.0, which does then try to connect to the still running 2.4 nodes, which is an error, because for an update from between different major versions of Elasticsearch is required to perform a full cluster restart. Our problem now is, that this kind of deployment strategy is not possible with Bosh, also because of the problem, that an update between different major versions may only be detected, when the first node is already being updated.
To resolve this problem, we looked into multiple different solutions, but none of them seamed to work:
- Shutdown node via start-/stop-script (would need ssh+root access from each Elasticsearch node to every other node of the Elasticsearch cluster (nothing we want to grant).
- Use Elasticsearch
_shutdownAPI: no longer present, since 2.x
_shutdownAPI as an Elasticsearch plugin based on https://github.com/elastic/elasticsearch/commit/d164526d2735a04960f224a8e40912e1fdf91570: not successful, because we found no way to access the
org.elasticsearch.nodeinstance of the running instance (for Elasticsearch 2.x it might be possible by using the dependency injection of guice, but for version 5.x this would no longer work).
So my questions are:
- Is there a way to implement the
_shutdownAPI as an Elasticsearch plugin, which would work for version 2.x and 5.x?
- Are there other possibilities to perform a full cluster shutdown (restart would then be performed by Bosh), executed from the first updated node (canary)?