A few different things that I see here that can be improved:
- Given that you're using dedicated master nodes, you should ensure that traffic via the cluster service is only routed to data nodes. You can do this by:
spec:
http:
service:
spec:
selector:
elasticsearch.k8s.elastic.co/cluster-name: <cluster-name>
elasticsearch.k8s.elastic.co/node-data: 'true'
- Look at setting Availability Zone awareness on your nodes. This will help ensure that Elasticsearch evenly distributes shards across zones.
spec:
nodeSets:
- name: masters
config:
cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
node.attr.zone: ${ZONE}
podTemplate:
spec:
containers:
- name: elasticsearch
env:
- name: ZONE
valueFrom:
fieldRef:
fieldPath: metadata.annotations['topology.kubernetes.io/zone']
initContainers:
- name: elastic-internal-init-keystore
env:
- name: ZONE
valueFrom:
fieldRef:
fieldPath: metadata.annotations['topology.kubernetes.io/zone']
- name: data
config:
cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
node.attr.zone: ${ZONE}
podTemplate:
spec:
containers:
- name: elasticsearch
env:
- name: ZONE
valueFrom:
fieldRef:
fieldPath: metadata.annotations['topology.kubernetes.io/zone']
initContainers:
- name: elastic-internal-init-keystore
env:
- name: ZONE
valueFrom:
fieldRef:
fieldPath: metadata.annotations['topology.kubernetes.io/zone']
Some additional notes:
- Your dedicated master nodes seem very over spec'd, you could probably get away with 2 CPU, 4-8GB RAM for this size of cluster.
- 7 Data nodes doesn't nicely "split" across multiple availability zones for high availability. Not sure what you're underlying AWS availability zone architecture looks like, but generally speaking, you'd want to make sure your cluster can handle at least one AZ failure.
- Try using GP3 with XFS rather than the default ext4. I've found moderate improvements in using XFS, but it is also somewhat use-case specific.
- I see you have
ml
set as a node role on your data nodes. Given that you're using the basic license, you can probably remove this node role. Also, if you do plan on adding ML to the cluster. It is generally recommended to use dedicate ML nodes. - You should generally ensure that Elasticsearch nodes of the same type are evenly distributed across AZs, this can be done via:
spec:
nodeSets:
- name: masters
podTemplate:
spec:
topologySpreadConstraints:
- labelSelector:
matchLabels:
elasticsearch.k8s.elastic.co/cluster-name: <cluster-name>
maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
- labelSelector:
matchLabels:
elasticsearch.k8s.elastic.co/cluster-name: <cluster-name>
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
- name: data
podTemplate:
spec:
topologySpreadConstraints:
- labelSelector:
matchLabels:
elasticsearch.k8s.elastic.co/cluster-name: <cluster-name>
maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
- labelSelector:
matchLabels:
elasticsearch.k8s.elastic.co/cluster-name: <cluster-name>
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
- While technically not required, I would generally recommend setting a persistent volume on the master nodes.