This makes the problem a lot simpler. Planning for CA changes can require a few steps, but just updating node certs is pretty easy.
Since I want the services to remain uninterrupted, I assume that I need to perform a rolling restart rather than a full cluster restart.
Not necessarily. Elasticsearch monitors the SSL resources for updates so you can just copy the new certificate and key files (or keystore) into place and the node will pick it up.
As the certificate are different how would the updated node know to re-join the cluster?
There's nothing that should be necessary here.
If you restart the node, then it joins a cluster by
- connecting to the seed hosts
- establishing a trusted SSL connection
- verifying that those hosts are part of a cluster with the same name (
cluster.name
in elasticsearch.yml
)
Steps 1 & 3 don't depend on the certificates at all, so nothing changes there.
Step 2 will work without any changes, because you haven't switched CAs.
If you update the certificates in place, then any existing connections to the cluster will be fine. Certificate verification on takes place when a connection is first established so there's not need to force the node to re-join.
What are the steps?
First decide whether you want to do an in place update or a rolling restart. Both can work. The rolling restart is a little bit safer (for reasons I will explain below) but has all the complications of restarting nodes (disabling allocation, clients being disconnected, etc). An in place upgrade avoids the restart issues, but you need to monitor the nodes for some time after the change to make sure everything worked correctly.
There are 2 reasons why in place upgrades have slightly more risk:
- If you use PEM files, then your certificate and key are in different files so you need to update them simultaneously or the node may experience a temporary period where it cannot establish new connections.
- Updating the certificate & key does not automatically force existing connections to be refreshed, so if you do something wrong the node may look like it's working correctly, but that is because it still has existing connections. It's possible to make a mistake that leaves the node in a state where it cannot establish new connections with other nodes (and therefore cannot recover from a network outage or a node restart).
Warning: These steps assume that the CA isn't changing. I know that's true for you, it might not be true for everyone who reads this post, and I don't want them to get into trouble.
Via a rolling restart
- Follow all the steps of a rolling restart.
- While each node is stopped (the Perform any needed changes step), switch the node's SSL certificate and key. You can do this either by:
- Change
elasticsearch.yml
to point to new file locations
- Change the contents of the existing SSL resources to contain your new certificate and key.
It really is that simple. If you are using PEM then you need to change the .key
and .certificate
files. If you are using PKCS#12 or JKS then you just have 1 file to change, but it is recommended that you replace the existing entry rather than adding a second entry.
Via in-place certificate updates