Hello,
I wanted to share some insight that we came accross on an upgrade we recently did.
Upgrade ECK operator version from 2.16+ to 3+ since only 3+ supports 9+ elastic
Upgrade path was from 8.19.11 → 9.1.9 at first since the upgrade paths written on the documentation clearly mentioned this.
We are using elastic as a small stateless search engine that can be repopulated in very fast times, so losing data or removing disks was not a problem for us. We do not have Kibana, so the problem I will mention next was not available for our eyes before we tried the upgrade. Maybe it would have been caught by the upgrade assistant, but not sure.
So the TLDR version of the error is this;
We were also able to pinpoint and find the node_shutdown field names on failing node restart of the first node after rolling restarts begin when we sync the 9.1.9 version.
The field name on the metadata left by the 8.19.11 node was not able to picked up by the newly rolled 9.1.9 node so it broke completely on start up. Two solutions were provided, first removing the node_shutdown metadata which did not help and the second was to remove the PVC of the failing elastic node so the node metadata is removed with it (which we did not go thru with, because it seemed like an overkill).
After couple more tries we read the elasticsearch version 9.2.5 removed this “BUG”. We reverted back to 8.19.11 with a green state, then tried the upgrade with 9.2.5 and it worked. So eventhough the documentation seems to be pointing at 9.0+ or 9.1+ as an entry point for elastic version 9 upgrades, all elasticsearch 9+ versions between 9.0.0 to 9.2.4 has this BUG and will cause your upgrade to fail.
Since, if we have upgraded the node_shutdown metadata and let 9.1.9 to create it again, it would have created the field name “shutdown_startedmillis” on later version we would have to fix this again back to its original field name, so going straight to 9.2.5 looked like a clear winner.
Just wanted to share.
Cheers.

