I just saw that ECK is now generally available and out of beta stage. Was the above issue fixed then? Has anyone looked into it?
Could you please retry with 1.0.0 and paste the logs here, this new release should produce some additional information when the upgrade process is stuck.
updateStrategy: changeBudget: maxSurge: 2 maxUnavailable: 0
If you were using the beta-1 it is surprising that you are not hitting this bug: https://github.com/elastic/cloud-on-k8s/issues/2034 (It is fixed in 1.0.0) (edit: actually I think you will once you apply the suggested fix below)
The root cause of your problem is that you are not allowing a single Pod to be unavailable, ECK can't restart a Pod.
Try the following:
updateStrategy: changeBudget: maxSurge: 2 maxUnavailable: 1
Hmm.. I will try the new GA version.
I thought that it would add an extra node before removing one, as the
maxSurge setting allows that.
If I remember correctly I have taken the
maxUnavailable: 0 setting from some documentation page or Elastic forum, or something else. Now I remember that they were talking of a bug in ECK beta or even alpha version, where setting it to 1 or more would cause issues ( something like that )
maxSurge: 2 maxUnavailable: 0
Thanks. I will post back with my results after trying the GA version and
maxUnavailable > 0
It seems to work fine now. I will open another question if I run into any issues.
Thanks for your help.