Automated full cluster shutdown/restart (possible to implement _shutdown API as a plugin?)

We have many independent Elasticsearch clusters running on version 2.x and we are currently looking into ways to perform automated updates of these clusters to version 5.x.

Our deployments are managed with Bosh (BOSH is an open source tool for release engineering, deployment, lifecycle management, and monitoring of distributed systems.). With Bosh, new releases are rolled out in an automated fashion, using a canary deployment as strategy. This means, if we update an Elasticsearch cluster from version 2.4 to 5.0, the first node gets updated to version 5.0, which does then try to connect to the still running 2.4 nodes, which is an error, because for an update from between different major versions of Elasticsearch is required to perform a full cluster restart. Our problem now is, that this kind of deployment strategy is not possible with Bosh, also because of the problem, that an update between different major versions may only be detected, when the first node is already being updated.

To resolve this problem, we looked into multiple different solutions, but none of them seamed to work:

  • Shutdown node via start-/stop-script (would need ssh+root access from each Elasticsearch node to every other node of the Elasticsearch cluster (nothing we want to grant).
  • Use Elasticsearch _shutdown API: no longer present, since 2.x
  • Reimplement _shutdown API as an Elasticsearch plugin based on https://github.com/elastic/elasticsearch/commit/d164526d2735a04960f224a8e40912e1fdf91570: not successful, because we found no way to access the org.elasticsearch.node instance of the running instance (for Elasticsearch 2.x it might be possible by using the dependency injection of guice, but for version 5.x this would no longer work).

So my questions are:

  • Is there a way to implement the _shutdown API as an Elasticsearch plugin, which would work for version 2.x and 5.x?
  • Are there other possibilities to perform a full cluster shutdown (restart would then be performed by Bosh), executed from the first updated node (canary)?

You probably couldn't put this in as a plugin, the changes with security manager would likely restrict you calling your OS level service manager to shut it down.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.