Slower Perfomance with Elaticsearch cluster in kubernetes compared to Docker

BenB196 · February 17, 2023, 1:00pm

A few different things that I see here that can be improved:

Given that you're using dedicated master nodes, you should ensure that traffic via the cluster service is only routed to data nodes. You can do this by:

spec:
  http:
    service:
      spec:
        selector:
          elasticsearch.k8s.elastic.co/cluster-name: <cluster-name>
          elasticsearch.k8s.elastic.co/node-data: 'true'

Look at setting Availability Zone awareness on your nodes. This will help ensure that Elasticsearch evenly distributes shards across zones.

spec:
  nodeSets:
    - name: masters
      config:
        cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
        node.attr.zone: ${ZONE}
      podTemplate:
        spec:
          containers:
            - name: elasticsearch
              env:
                 - name: ZONE
                   valueFrom:
                     fieldRef:
                       fieldPath: metadata.annotations['topology.kubernetes.io/zone']
           initContainers:
             - name: elastic-internal-init-keystore
               env:
                 - name: ZONE
                   valueFrom:
                     fieldRef:
                       fieldPath: metadata.annotations['topology.kubernetes.io/zone']
    - name: data
      config:
        cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
        node.attr.zone: ${ZONE}
      podTemplate:
        spec:
          containers:
            - name: elasticsearch
              env:
                 - name: ZONE
                   valueFrom:
                     fieldRef:
                       fieldPath: metadata.annotations['topology.kubernetes.io/zone']
           initContainers:
             - name: elastic-internal-init-keystore
               env:
                 - name: ZONE
                   valueFrom:
                     fieldRef:
                       fieldPath: metadata.annotations['topology.kubernetes.io/zone']

Some additional notes:

Your dedicated master nodes seem very over spec'd, you could probably get away with 2 CPU, 4-8GB RAM for this size of cluster.
7 Data nodes doesn't nicely "split" across multiple availability zones for high availability. Not sure what you're underlying AWS availability zone architecture looks like, but generally speaking, you'd want to make sure your cluster can handle at least one AZ failure.
Try using GP3 with XFS rather than the default ext4. I've found moderate improvements in using XFS, but it is also somewhat use-case specific.
I see you have ml set as a node role on your data nodes. Given that you're using the basic license, you can probably remove this node role. Also, if you do plan on adding ML to the cluster. It is generally recommended to use dedicate ML nodes.
You should generally ensure that Elasticsearch nodes of the same type are evenly distributed across AZs, this can be done via:

spec:
  nodeSets:
    - name: masters
      podTemplate:
        spec:
          topologySpreadConstraints:
            - labelSelector:
                matchLabels:
                  elasticsearch.k8s.elastic.co/cluster-name: <cluster-name>
              maxSkew: 1
              topologyKey: kubernetes.io/hostname
              whenUnsatisfiable: DoNotSchedule
            - labelSelector:
                matchLabels:
                  elasticsearch.k8s.elastic.co/cluster-name: <cluster-name>
              maxSkew: 1
              topologyKey: topology.kubernetes.io/zone
              whenUnsatisfiable: ScheduleAnyway
    - name: data
      podTemplate:
        spec:
          topologySpreadConstraints:
            - labelSelector:
                matchLabels:
                  elasticsearch.k8s.elastic.co/cluster-name: <cluster-name>
              maxSkew: 1
              topologyKey: kubernetes.io/hostname
              whenUnsatisfiable: DoNotSchedule
            - labelSelector:
                matchLabels:
                  elasticsearch.k8s.elastic.co/cluster-name: <cluster-name>
              maxSkew: 1
              topologyKey: topology.kubernetes.io/zone
              whenUnsatisfiable: ScheduleAnyway

While technically not required, I would generally recommend setting a persistent volume on the master nodes.

Topic		Replies	Views
Slower Perfomance with Elaticsearch cluster in kubernetes compared to Docker Elasticsearch docker	4	356	June 23, 2023
Performance issues on shared Kubernetes cluster Elasticsearch	3	329	March 2, 2021
ECK performance optimisation on Kubernetes Elasticsearch	5	36	December 10, 2024
Low performance while searching Elasticsearch	5	511	November 27, 2020
My performance won't scale with the cluster Elastic Cloud on Kubernetes (ECK)	4	457	April 15, 2022

Slower Perfomance with Elaticsearch cluster in kubernetes compared to Docker

Related topics