Rolling restarts issue


I made a change in elasticsearch.yml (disabling Shield for testing). First, I executed the following PUT on one of the master nodes directly ::

    "transient" : {
        "cluster.routing.allocation.enable" : "none"

I then rebooted the node, but now I am stuck and do not know if the master node has joined the cluster already...there are 3 master nodes total.

I tried changing 'none' to 'all' and running it against that node but now I am getting nothing but 'master_not_discovered_exception' 's in the response to the PUT request. The cluster seems to be down at this point.

Not sure what to do now...

On one of the master nodes that was not restarted, curl _cat/master or _cat/nodes. I would also take a look at the log file on the node that you restarted, and see why it was not able to join the cluster.

You should not enable allocation on a rolling upgrade until the node has rejoined the cluster, it will cause the shards to be redistributed evenly amongst the current nodes in the cluster.

Well, I said whatever and just rebooted all of the master and data nodes at the same time, now it is back. Thankfully, this was not a production environment, but if it was, this would be a problem.

I followed the instructions for rolling restarts as it said, and this is a very basic cluster, so it is relatively unacceptable to me that nothing is mentioned about looking out for things like this.

Also, I tried looking at the log already (in /var/log/elasticsearch). The only log file that is updated today only contains an error regarding getting the cluster health due to Shield expiration. I assume this would not affect a node being able to join a cluster, correct?

I see nothing else besides that in the log file.

No, the behavior of each plugins license expiration is documented.

It is hard to say what exactly went wrong. You might find this article helpful in understanding your cluster health. However, with the shield license expired, cluster health, cluster stats, and index stats APIs are blocked. If you have a support subscription, you should reach out directly to get your license situation resolved and to further troubleshoot what went wrong in your cluster.