How best to take Elastic master nodes down, make configuration change and then start them up?

I have to take down all the master nodes at once and then start them all back up again. My scenario is that I'm on Kubernetes and I'm using a specific storage type. I need to change the storage type and you can't just update one node and then update the next - The storage class for Kubernetes statefulsets can't be changed. To change the storage class for a statefulsets, you must delete the old statefulset, change the storage class and then redeploy. I'm wondering what is the best way to do this. I could just change the statefulset to use the new storage class, delete the master nodes and start them up again with the change. It's my understanding cluster state is stored on the data nodes so I think maybe the master nodes that start up will get that data from the data nodes. Any ideas?

No, that's not how it works. The cluster state is only stored on master nodes which therefore must have storage that persists across restarts. See these docs:

IMPORTANT: Master nodes must have access to the data/ directory (just like data nodes) as this is where the cluster state is persisted between node restarts.

Ok. Thanks for the info. Do you have any recommendations on what to do in my scenario?

As long as the data directory persists across restarts, you should be fine. You're doing a full cluster restart and there's some detailed instructions for this in the manual.

So the data nodes store cluster state for client information purposes only?

https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-state.html

" For debugging purposes, you can retrieve the cluster state local to a particular node by adding local=true to the query string."

The data directory for the masters won't persist. By changing the kubernetes storage class it's as if i've got my data on one ssd and i'm going to replace it with another ssd but i can't just swap it out. I need to take down the masters, change the pointer to the new storage that has no data in it and then start up the masters.

One idea I have is to make my data nodes master eligible and then take down the masters make the change and then remove the master role from the data nodes.

Yes, pretty much.

Once the nodes are shut down you can move the data path to ia new location "by hand".

Thanks for the info but in my Kubernetes world with dynamic provisioning storage, storage classes and Helm charts it's not so easy.
I'm thinking of changing my data nodes to be data nodes and master nodes - temporarily - and then take down the masters, make the change bring them up and then remove the master role from the data nodes. Do you see a problem with that?

Sounds extraordinarily complex compared with simply moving some files around, but it is feasible to do it that way too. I recommend you read https://www.elastic.co/guide/en/elasticsearch/reference/current/add-elasticsearch-nodes.html#add-elasticsearch-nodes-master-eligible carefully since removing master-eligible nodes from a cluster needs extra steps sometimes.

Running Elastic on Kubernetes is extraordinarily complex sometimes.
I'm afraid if I take the masters down, make the change to a new storage class, then try to move the data back to the data directory before the master nodes start up there may be a race condition. The new masters may come up and think this is a new cluster and i'm going to publish the new cluster info to all the data nodes and that might wipe out everything. In Kubernetes you have storage provisioners that assign storage right before a master node will come up. It's not so clear how i can jump in there load it with data before the master nodes start up. I can't easily load data into a directory before the master nodes start up.

I'm curious why you think option a is the complex one.

options:
a) make the data nodes master and data, take down master and make change, start them up again and then remove master role from data.
b) backup master data, take down master nodes, somehow before master nodes start up with dynamically allocated storage by the Kubernetes dynamic provisioner move the data to the master nodes and then master nodes start up. And hope there was no race condition that allowed the master nodes to publish to all the data nodes an empty cluster.

It's not like regular hard drives and servers where you have full control over everything and can just update one at a time.

No, there are at least two protections against that. Firstly, if you follow the manual and remove cluster.initial_master_nodes from your config then no new cluster will form. Secondly, even if you don't follow those instructions, the new cluster would get a new cluster UUID and the data nodes would ignore it since it doesn't match the old cluster UUID.

Scary stuff.

Ok. I'm thinking of creating a new master statefulset with the new storage type and then bring down the old master statefulset. I'll try it in test and if it works i'll update this post for other's future ref.

Yea it is scary. It's a pain. I've been buying Elastic stock because I don't think joe shmoe admin will be doing this stuff for long. People'll just use you're guys saas solution. Everything's going to the cloud!

So if anyone is following along and you are running Elastic on Kubernetes and have to change the storage type for your master class there's a simple solution:

  1. spin up a new statefulset of masters - using either ECK or Helm - with the new storage class. Be sure it has a different name then the old one.
  2. spin down the old statefulset

done

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.