ECK is great at updating Elasticsearch nodes in a rolling fashion but I was wondering if there's any recommendations for how to upgrade the underlying kubernetes worker nodes that Elasticsearch is running on.
My thought on approaching this is to bring up a new set of upgraded kubernetes nodes, taint the old set of nodes and force ECK to do a rolling restat bringing the ES cluster up on the new set of kubernetes nodes. With that being said I'm not sure the best way to force ECK to start a rolling upgrade for you.
If you want to just move the pods to the new nodes, I think you could simply drain the old nodes (assuming you don't have any local volumes provisioned) to get Kubernetes to move the pods for you.
Updating the pod template in the Elasticsearch manifest will trigger a rolling restart. It can be something simple like adding an annotation to the pod template.
@charith-elastic If I just drained the old nodes I don't think it'd be safe for ES?
I assume when ECK is upgrading nodes it disables shard allocation, runs a synced flush and it will guarantee only 1 node is down at a time and waits for the cluster to go back green before the next one. Draining Kubernetes nodes alone wouldn't guarantee any of that.
Thanks for the tip with adding an annotation. I'll try that.
Yes, it is certainly not graceful. However, ECK creates a default pod disruption budget per cluster and the drain operation should honour that. Assuming you have enough replicas in your cluster, one pod disappearing (the default PDB rule) should not be too disruptive. Of course, your risk profile might be different and you may want to do something less riskier like tainting the nodes and restarting the Elasticsearch cluster as mentioned.
What happens if we use selfmanaged node groups and the Rolling update of the nodes will auto? How control that the data pods are finished the resharding?
Is there any option to request a specific pod restart from ECK so that the restart is managed by the operator and follows the algorithm that is also used for a normal rolling restart or upgrade? I would like to trigger it somehow from an external process by setting an annotation on the pod for example.
The idea is to block the node drain request using a PDB with maxUnavailable=0 and a process which receives the eviction event to annotate the pod for a "manual" coordinated restart.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.