Slower Perfomance with Elaticsearch cluster in kubernetes compared to Docker

Hi All,

I've had the same topic opened before but it seems like our issue has returned after implementing the feedback and testing once more. As mentioned in the title we seem to notice much slower performance when our cluster is hosted on docker containers compared to AWS EKS.

We're loading same volume of data in both clusters (docker and kubernetes) and running the same test using gatling with 200 concurrent users. What we're seeing is fast performance using docker (2-3s response time for most GET requests) and much slower performance in kubernetes (20s response time).

Has there been a comparison between docker and kubernetes setup before?

This is what my values look like:

eck-elasticsearch:
  fullnameOverride: eck-elasticsearch
  version: 7.16.3
  annotations:
    eck.k8s.elastic.co/license: basic
    eck.k8s.elastic.co/downward-node-labels: "topology.kubernetes.io/zone" # allows specifying which node label should be used to determine the availability zone name
  http:
    service: 
      spec:
        selector: 
          elasticsearch.k8s.elastic.co/cluster-name: eck-elasticsearch
          elasticsearch.k8s.elastic.co/node-data: 'true' #Enable traffic routing via data nodes only
    tls:
      selfSignedCertificate:
        disabled: true
  updateStrategy:
    changeBudget:
      maxSurge: 3
      maxUnavailable: 1
  nodeSets:
  - name: masters
    count: 3
    # podDisruptionBudget:
    #   spec:
    #     minAvailable: 2
    #     selector:
    #       matchLabels:
    #         elasticsearch.k8s.elastic.co/cluster-name: quickstart
    config:
      node.roles: ["master"]
      #Enable ES zone awareness (node and zone) for even distribution of shards.
      cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
      node.attr.zone: $ZONE
      node.store.allow_mmap: false
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
          env:
          # specify that the annotation that was added on the Pod by the operator due to the `eck.k8s.elastic.co/downward-node-labels` annotation feature should be automatically transformed to an env var
          - name: ZONE
            valueFrom:
              fieldRef:
                fieldPath: metadata.annotations['topology.kubernetes.io/zone']
          resources:
            requests:
              cpu: 2
              memory: 8Gi
            limits:
              cpu: 2
              memory: 8Gi
        #Enable master nodes to be evenly balanced across AZs
        topologySpreadConstraints:
            - labelSelector:
                matchLabels:
                  elasticsearch.k8s.elastic.co/cluster-name: scibitesearch-eck-elasticsearch
              maxSkew: 1
              topologyKey: kubernetes.io/hostname
              whenUnsatisfiable: DoNotSchedule
            - labelSelector:
                matchLabels:
                  elasticsearch.k8s.elastic.co/cluster-name: eck-elasticsearch
              maxSkew: 1
              topologyKey: topology.kubernetes.io/zone
              whenUnsatisfiable: ScheduleAnyway
        initContainers:
        - command:
          - sh
          - "-c"
          - sysctl -w vm.max_map_count=262144
          name: sysctl
          securityContext:
            privileged: true
            runAsUser: 0
        - command:
          - sh
          - "-c"
          - bin/elasticsearch-plugin install --batch mapper-annotated-text
          name: install-plugins
          securityContext:
            privileged: true
  - name: data
    count: 9
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path.
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 500Gi
        storageClassName: encrypted-gp3-retain
    config:
      node.roles: ["data", "ingest", "transform"]
      #Enable ES zone awareness for even distribution of shards.
      cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
      node.attr.zone: $ZONE
      node.store.allow_mmap: false
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
          env:
          - name: zone
            valueFrom:
              fieldRef:
                fieldPath: metadata.annotations['topology.kubernetes.io/zone']
          resources:
            requests:
              cpu: 4
              memory: 16Gi
            limits:
              cpu: 4
              memory: 16Gi
        #Enable data nodes to be evenly balanced across AZs
        topologySpreadConstraints:
            - labelSelector:
                matchLabels:
                  elasticsearch.k8s.elastic.co/cluster-name: eck-elasticsearch
              maxSkew: 1
              topologyKey: kubernetes.io/hostname
              whenUnsatisfiable: DoNotSchedule
            - labelSelector:
                matchLabels:
                  elasticsearch.k8s.elastic.co/cluster-name: eck-elasticsearch
              maxSkew: 1
              topologyKey: topology.kubernetes.io/zone
              whenUnsatisfiable: ScheduleAnyway 
        initContainers:
        - command:
          - sh
          - "-c"
          - sysctl -w vm.max_map_count=262144
          name: sysctl
          securityContext:
            privileged: true
            runAsUser: 0
        - command:
          - sh
          - "-c"
          - bin/elasticsearch-plugin install --batch mapper-annotated-text
          name: install-plugins
          securityContext:
            privileged: true

Even using over-specd kubernetes cluster our performance is still significantly slower compared to docker. Here we're using 9 data nodes 16GB RAM each with 4cpu - this should be more than enough. Any ideas on where the bottleneck could be?

What is the size of your data set? How much indexed data is stored per node?

If all indexed data can not fit in the operating system page cache disk I/O and performance is a common limiting factor. What type of storage are you using? Is it exactly the same for docker nodes as for k8s pods?

Does the latency difference vary depending on the number of concurrent queries?


Uploaded screenshot of indices size. ~3TB of data being stored.

We're using gp3 EBS volumes for docker and kubernetes.

Performance does seem to be much faster with less concurrent users (around 20) but slower with around 200 users (not the case on docker)

Our docker setup:
All elastic pods are on single ec2 instance m5.8xlarge

Kubernetes setup:
Multi AZ setup where elastic pods are spread evenly across each zone on m5a.4xlarge instances.

We did try having all elastic pod in our kubernetes setup on a single ec2 instance m5.8xlarge but response times were still slow.

Just wanted to know is it recommended to set mmap to true for production workload along with -w vm.max_map_count=262144?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.