We currently have an Elasticsearch cluster with a valid license, consisting of 7 nodes deployed on VMs. The current version is 7.16.2. We are planning to upgrade both the ELK Stack and the underlying operating system. Our target is to upgrade Elasticsearch to 8.18 and the OS from RHEL 7 to RHEL 9.
Could you please provide your recommended upgrade steps and any important considerations or best practices for this specific upgrade path? We would appreciate guidance on the optimal sequence of upgrades (e.g., OS first, then Elasticsearch, or vice versa) and any potential compatibility issues or prerequisites we should be aware of.
As long as you stay within the support matrix it doesn't really matter which order you upgrade things. However note that 7.16 is not supported on RHEL9 whereas 8.18 is supported on RHEL7 so that forces you to upgrade Elasticsearch first.
We've evaluated the situation and come up with two potential approaches for the upgrade. We'd appreciate your guidance on which one you deem more suitable:
Approach 1: Build New Node, Then Decommission Old Node (Node-by-Node Replacement)
1.Provision a new VM with the target OS (RHEL 9), new JDK, and the new Elasticsearch version (8.18).
2.Remove an existing Elasticsearch node (7.16.2) from the current cluster.
3.Migrate the configuration files and data from the old Elasticsearch node to the new VM environment.
4.Add the new Elasticsearch node (8.18) to the cluster.
5.Repeat for all remaining nodes.
Approach 2: In-Place Upgrade (Direct Node Upgrade)
1.Remove an existing Elasticsearch node (7.16.2) from the current cluster.
2.Uninstall the old JDK and the old Elasticsearch software from the original ES node.
3.Perform the OS upgrade steps on the original ES node (from RHEL 7 to RHEL 9).
4.Install the new JDK and the new Elasticsearch software (8.18) on the upgraded RHEL 9 OS.
5.Add the new Elasticsearch node (8.18) to the cluster.
6.Repeat for all remaining nodes.
In theory either is fine but in practice it's very risky to upgrade both application software and the OS at the same time as you propose. If you encounter any problems you will struggle to work out whether it was the Elasticsearch upgrade or the OS upgrade that caused them. Just do a regular ES upgrade, make sure everything is working well, and then do the OS upgrade later.
Indeed, and a little suggestion would be to do testing of the upgrade process on a non-production cluster first. If necessary, just build a small cluster from RHEL 7 / elasticsearch 7.16.2, maybe using VMware or Virtualbox or similar, and try to make as representative as possible.
On RHEL side, the upgrade path from RHEL7 to RHEL9 also includes a stop a RHEL 8.
So thats a 2 step, 7-node elasticsearch upgrade process and then a 2-step, 7-node RHEL upgrade process. You should set aside time for this. And be realistic about the time required, thats not something to plan for "Tuesday afternoon" but rather "week 25 and maybe week 26".
I agree with David, best not to try upgrading both application software and the OS at the same time for the reasons he gave. (almost) Always best to minimize risk, even at cost of time. IMO.
Alternatively you could just wipe each RHEL7 machine and start afresh with RHEL9. As long as you use the same ES version on each to avoid introducing risks there.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.