What did you do?
Followed instructions for making the ES cluster zone aware.
What did you expect to see?
Pods scheduled evenly across all zones.
What did you see instead? Under which circumstances?
3 total zones, 3 node master cluster, 1 zone with 2 pods each, 1 zone with 0 pods
Environment
AWS EKS
- ECK version:
2.4.0 - Kubernetes information:
Cloud: EKS
Version: v1.23.7 - Resource definition:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
annotations:
eck.k8s.elastic.co/downward-node-labels: "topology.kubernetes.io/zone"
name: ats-perf
spec:
version: 7.17.6
image: slideroom/elasticsearch:7.17.6 # https://github.com/slideroom/infrastructure/blob/master/docker/elasticsearch
http:
service:
metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
spec:
type: LoadBalancer
tls:
selfSignedCertificate:
# subjectAltNames:
# - dns: elastic.ats.int
# - dns: localhost
disabled: true
nodeSets:
- name: master-v1
count: 3
config:
node.roles: [ "master" ]
node.attr.zone: ${ZONE}
cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
xpack.monitoring.collection.enabled: true
reindex.remote.whitelist: "elastic.ats.int:9200"
action.destructive_requires_name: true
# if not setting max_map_count in an init container, then use this setting
#node.store.allow_mmap: false
podTemplate:
spec:
containers:
- name: elasticsearch
env:
- name: ES_JAVA_OPTS
value: -Xms1g -Xmx1g
- name: ZONE
valueFrom:
fieldRef:
fieldPath: metadata.annotations['topology.kubernetes.io/zone']
resources:
limits:
memory: 2Gi
cpu: 1
nodeSelector:
role: elastic-master
topologySpreadConstraints:
- maxSkew: 1
minDomains: 3
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
elasticsearch.k8s.elastic.co/cluster-name: ats-perf
elasticsearch.k8s.elastic.co/statefulset-name: ats-perf-es-master-v1
serviceAccountName: ats-perf-elastic
# related to "node.store.allow_mmap: false" setting above
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 4Gi
storageClassName: io2-max-encrypted
- name: data-v1
count: 5
config:
node.roles: [ "data", "ingest", "transform" ]
node.attr.zone: ${ZONE}
cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
xpack.monitoring.collection.enabled: true
reindex.remote.whitelist: "elastic.ats.int:9200"
action.destructive_requires_name: true
# if not setting max_map_count in an init container, then use this setting
#node.store.allow_mmap: false
podTemplate:
spec:
containers:
- name: elasticsearch
env:
- name: ES_JAVA_OPTS
value: -Xms13g -Xmx13g
- name: ZONE
valueFrom:
fieldRef:
fieldPath: metadata.annotations['topology.kubernetes.io/zone']
resources:
requests:
memory: 26Gi
cpu: 5700m
limits:
memory: 26Gi
cpu: 5700m
nodeSelector:
role: elastic-data
topologySpreadConstraints:
- maxSkew: 1
minDomains: 4
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
elasticsearch.k8s.elastic.co/cluster-name: ats-perf
elasticsearch.k8s.elastic.co/statefulset-name: ats-perf-es-data-v1
serviceAccountName: ats-perf-elastic
# related to "node.store.allow_mmap: false" setting above
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: io2-10-encrypted
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: ats-perf
spec:
version: 7.17.6
count: 1
elasticsearchRef:
name: ats-perf
config:
xpack.monitoring.enabled: true
podTemplate:
spec:
containers:
- name: kibana
env:
- name: NODE_OPTIONS
value: "--max-old-space-size=2048"
resources:
requests:
memory: 1Gi
cpu: 0.5
limits:
memory: 2.5Gi
cpu: 2
http:
service:
metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
spec:
type: LoadBalancer
tls:
selfSignedCertificate:
disabled: true
The node groups are set to autoscale. They were all scaled to 1 node each before and once 2 pods got scheduled on 1 of the zones the autoscaler scaled the other node back since it didn't have anything on it.