Best practices for ECK on EKS with multi-AZ nodegroups and EBS volumes during node upgrades

hormander · May 17, 2025, 10:49am

Hi all,

I'm running an Elasticsearch cluster on EKS using the ECK operator and I'm trying to understand the best way to handle node upgrades in a multi-AZ setup, especially with EBS volumes involved.

Here’s a simplified example (I know a 2-node cluster isn’t recommended — this is just for illustration):
I have 2 Elasticsearch nodes, one in AZ A and one in AZ B. Each pod is scheduled in its respective AZ and uses an EBS volume in the same zone. The issue comes during nodegroup upgrades: EKS recreates the nodes in random AZs (e.g., AZ B and AZ C), so the pod that was in AZ A might get rescheduled to AZ C. Since EBS volumes are AZ-bound, the volume from AZ A can’t be attached anymore, and the cluster ends up in yellow status due to the missing data node.

AWS support confirmed that there’s no way to control the AZ placement during nodegroup upgrades — it’s random.

My idea is to create three separate nodegroups (nodegroup-az-a, nodegroup-az-b, nodegroup-az-c), each pinned to a specific AZ. This way, when a nodegroup is upgraded, the new nodes are always recreated in the same AZ. Then, I would define multiple nodeSets in the Elasticsearch manifest, each using a nodeSelector to target the corresponding AZ/nodegroup. This should ensure pods stay in the correct zone and can always access their EBS volumes.

Here’s an example manifest I’m considering:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch-cluster
spec:
  version: 8.12.0
  nodeSets:
    - name: az-a
      count: 1
      config:
        node.roles: ["data", "ingest", "master"]
      podTemplate:
        spec:
          nodeSelector:
            topology.kubernetes.io/zone: eu-west-1a
    - name: az-b
      count: 1
      config:
        node.roles: ["data", "ingest", "master"]
      podTemplate:
        spec:
          nodeSelector:
            topology.kubernetes.io/zone: eu-west-1b
    - name: az-c
      count: 1
      config:
        node.roles: ["data", "ingest", "master"]
      podTemplate:
        spec:
          nodeSelector:
            topology.kubernetes.io/zone: eu-west-1c

Has anyone implemented something similar? Are there any caveats or better approaches to ensure data node stability and volume availability during upgrades?

Thanks in advance for any insights!

Topic		Replies	Views
Recommendation for upgrading underlying kubernetes nodes Elastic Cloud on Kubernetes (ECK)	6	1116	November 4, 2022
Elasticsearch pods stuck in pending: nodes are available: 3 node(s) had volume node affinity conflict, Elastic Cloud on Kubernetes (ECK)	2	2618	May 10, 2022
Updating nodes in Elasticsearch cluster Elasticsearch docker	3	840	May 17, 2020
How to keep an ES cluster happy during a K8s(!) node update Elasticsearch	2	919	February 14, 2022
Replacing Elasticsearch Node Elasticsearch	2	750	December 15, 2020

Best practices for ECK on EKS with multi-AZ nodegroups and EBS volumes during node upgrades

Related topics