Zone aware topologySpreadConstraints not working

ejsmith · October 17, 2022, 8:22pm

What did you do?
Followed instructions for making the ES cluster zone aware.

What did you expect to see?
Pods scheduled evenly across all zones.

What did you see instead? Under which circumstances?
3 total zones, 3 node master cluster, 1 zone with 2 pods each, 1 zone with 0 pods

Environment
AWS EKS

ECK version:
2.4.0
Kubernetes information:
Cloud: EKS
Version: v1.23.7
Resource definition:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  annotations:
    eck.k8s.elastic.co/downward-node-labels: "topology.kubernetes.io/zone"
  name: ats-perf
spec:
  version: 7.17.6
  image: slideroom/elasticsearch:7.17.6 # https://github.com/slideroom/infrastructure/blob/master/docker/elasticsearch
  http:
    service:
      metadata:
        annotations:
          service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
          service.beta.kubernetes.io/aws-load-balancer-internal: "true"
      spec:
        type: LoadBalancer
    tls:
      selfSignedCertificate:
        # subjectAltNames:
        # - dns: elastic.ats.int
        # - dns: localhost
        disabled: true
  nodeSets:
  - name: master-v1
    count: 3
    config:
      node.roles: [ "master" ]
      node.attr.zone: ${ZONE}
      cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
      xpack.monitoring.collection.enabled: true
      reindex.remote.whitelist: "elastic.ats.int:9200"
      action.destructive_requires_name: true
      # if not setting max_map_count in an init container, then use this setting
      #node.store.allow_mmap: false
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
          env:
          - name: ES_JAVA_OPTS
            value: -Xms1g -Xmx1g
          - name: ZONE
            valueFrom:
              fieldRef:
                fieldPath: metadata.annotations['topology.kubernetes.io/zone']
          resources:
            limits:
              memory: 2Gi
              cpu: 1
        nodeSelector:
          role: elastic-master
        topologySpreadConstraints:
          - maxSkew: 1
            minDomains: 3
            topologyKey: topology.kubernetes.io/zone
            whenUnsatisfiable: DoNotSchedule
            labelSelector:
              matchLabels:
                elasticsearch.k8s.elastic.co/cluster-name: ats-perf
                elasticsearch.k8s.elastic.co/statefulset-name: ats-perf-es-master-v1
        serviceAccountName: ats-perf-elastic
        # related to "node.store.allow_mmap: false" setting above
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 4Gi
        storageClassName: io2-max-encrypted
  - name: data-v1
    count: 5
    config:
      node.roles: [ "data", "ingest", "transform" ]
      node.attr.zone: ${ZONE}
      cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
      xpack.monitoring.collection.enabled: true
      reindex.remote.whitelist: "elastic.ats.int:9200"
      action.destructive_requires_name: true
      # if not setting max_map_count in an init container, then use this setting
      #node.store.allow_mmap: false
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
          env:
          - name: ES_JAVA_OPTS
            value: -Xms13g -Xmx13g
          - name: ZONE
            valueFrom:
              fieldRef:
                fieldPath: metadata.annotations['topology.kubernetes.io/zone']
          resources:
            requests:
              memory: 26Gi
              cpu: 5700m
            limits:
              memory: 26Gi
              cpu: 5700m
        nodeSelector:
          role: elastic-data
        topologySpreadConstraints:
          - maxSkew: 1
            minDomains: 4
            topologyKey: topology.kubernetes.io/zone
            whenUnsatisfiable: DoNotSchedule
            labelSelector:
              matchLabels:
                elasticsearch.k8s.elastic.co/cluster-name: ats-perf
                elasticsearch.k8s.elastic.co/statefulset-name: ats-perf-es-data-v1
        serviceAccountName: ats-perf-elastic
        # related to "node.store.allow_mmap: false" setting above
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: io2-10-encrypted

---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: ats-perf
spec:
  version: 7.17.6
  count: 1
  elasticsearchRef:
    name: ats-perf
  config:
    xpack.monitoring.enabled: true
  podTemplate:
    spec:
      containers:
      - name: kibana
        env:
          - name: NODE_OPTIONS
            value: "--max-old-space-size=2048"
        resources:
          requests:
            memory: 1Gi
            cpu: 0.5
          limits:
            memory: 2.5Gi
            cpu: 2
  http:
    service:
      metadata:
        annotations:
          service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
          service.beta.kubernetes.io/aws-load-balancer-internal: "true"
      spec:
        type: LoadBalancer
    tls:
      selfSignedCertificate:
        disabled: true

The node groups are set to autoscale. They were all scaled to 1 node each before and once 2 pods got scheduled on 1 of the zones the autoscaler scaled the other node back since it didn't have anything on it.

system · November 14, 2022, 8:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ECK - Kibana zone awareness not working Elastic Cloud on Kubernetes (ECK)	7	454	December 17, 2022
ECK 1.0.0-beta1 doesn't start pod Elastic Cloud on Kubernetes (ECK)	3	1376	November 4, 2022
ECK fails to start Elastic Cloud on Kubernetes (ECK) docker	1	394	March 15, 2023
Elastic Cloud on Kubernets not starting - advice needed on (probable) cause Elastic Cloud on Kubernetes (ECK)	3	770	November 4, 2022
Elasticsearch 8.5.3 ECK Services Elastic Cloud on Kubernetes (ECK) docker	1	300	January 11, 2023

Zone aware topologySpreadConstraints not working

Related topics