Zone aware topologySpreadConstraints not working

What did you do?
Followed instructions for making the ES cluster zone aware.

What did you expect to see?
Pods scheduled evenly across all zones.

What did you see instead? Under which circumstances?
3 total zones, 3 node master cluster, 1 zone with 2 pods each, 1 zone with 0 pods

Environment
AWS EKS

  • ECK version:
    2.4.0
  • Kubernetes information:
    Cloud: EKS
    Version: v1.23.7
  • Resource definition:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  annotations:
    eck.k8s.elastic.co/downward-node-labels: "topology.kubernetes.io/zone"
  name: ats-perf
spec:
  version: 7.17.6
  image: slideroom/elasticsearch:7.17.6 # https://github.com/slideroom/infrastructure/blob/master/docker/elasticsearch
  http:
    service:
      metadata:
        annotations:
          service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
          service.beta.kubernetes.io/aws-load-balancer-internal: "true"
      spec:
        type: LoadBalancer
    tls:
      selfSignedCertificate:
        # subjectAltNames:
        # - dns: elastic.ats.int
        # - dns: localhost
        disabled: true
  nodeSets:
  - name: master-v1
    count: 3
    config:
      node.roles: [ "master" ]
      node.attr.zone: ${ZONE}
      cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
      xpack.monitoring.collection.enabled: true
      reindex.remote.whitelist: "elastic.ats.int:9200"
      action.destructive_requires_name: true
      # if not setting max_map_count in an init container, then use this setting
      #node.store.allow_mmap: false
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
          env:
          - name: ES_JAVA_OPTS
            value: -Xms1g -Xmx1g
          - name: ZONE
            valueFrom:
              fieldRef:
                fieldPath: metadata.annotations['topology.kubernetes.io/zone']
          resources:
            limits:
              memory: 2Gi
              cpu: 1
        nodeSelector:
          role: elastic-master
        topologySpreadConstraints:
          - maxSkew: 1
            minDomains: 3
            topologyKey: topology.kubernetes.io/zone
            whenUnsatisfiable: DoNotSchedule
            labelSelector:
              matchLabels:
                elasticsearch.k8s.elastic.co/cluster-name: ats-perf
                elasticsearch.k8s.elastic.co/statefulset-name: ats-perf-es-master-v1
        serviceAccountName: ats-perf-elastic
        # related to "node.store.allow_mmap: false" setting above
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 4Gi
        storageClassName: io2-max-encrypted
  - name: data-v1
    count: 5
    config:
      node.roles: [ "data", "ingest", "transform" ]
      node.attr.zone: ${ZONE}
      cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
      xpack.monitoring.collection.enabled: true
      reindex.remote.whitelist: "elastic.ats.int:9200"
      action.destructive_requires_name: true
      # if not setting max_map_count in an init container, then use this setting
      #node.store.allow_mmap: false
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
          env:
          - name: ES_JAVA_OPTS
            value: -Xms13g -Xmx13g
          - name: ZONE
            valueFrom:
              fieldRef:
                fieldPath: metadata.annotations['topology.kubernetes.io/zone']
          resources:
            requests:
              memory: 26Gi
              cpu: 5700m
            limits:
              memory: 26Gi
              cpu: 5700m
        nodeSelector:
          role: elastic-data
        topologySpreadConstraints:
          - maxSkew: 1
            minDomains: 4
            topologyKey: topology.kubernetes.io/zone
            whenUnsatisfiable: DoNotSchedule
            labelSelector:
              matchLabels:
                elasticsearch.k8s.elastic.co/cluster-name: ats-perf
                elasticsearch.k8s.elastic.co/statefulset-name: ats-perf-es-data-v1
        serviceAccountName: ats-perf-elastic
        # related to "node.store.allow_mmap: false" setting above
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: io2-10-encrypted

---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: ats-perf
spec:
  version: 7.17.6
  count: 1
  elasticsearchRef:
    name: ats-perf
  config:
    xpack.monitoring.enabled: true
  podTemplate:
    spec:
      containers:
      - name: kibana
        env:
          - name: NODE_OPTIONS
            value: "--max-old-space-size=2048"
        resources:
          requests:
            memory: 1Gi
            cpu: 0.5
          limits:
            memory: 2.5Gi
            cpu: 2
  http:
    service:
      metadata:
        annotations:
          service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
          service.beta.kubernetes.io/aws-load-balancer-internal: "true"
      spec:
        type: LoadBalancer
    tls:
      selfSignedCertificate:
        disabled: true

The node groups are set to autoscale. They were all scaled to 1 node each before and once 2 pods got scheduled on 1 of the zones the autoscaler scaled the other node back since it didn't have anything on it.

image

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.