A performance issue about elasticsearch on k8s

I'm experiencing an es on k8s performance issue that confuses me, and this is my test environment. First, I have a k8s cluster with 4 nodes, of which 1 master and 3 nodes:

NAME         STATUS   ROLES               AGE    VERSION
172.24.5.3   Ready    master,monitoring   234d   v1.13.5
172.24.5.4   Ready    monitoring,node     234d   v1.13.5
172.24.5.5   Ready    node                234d   v1.13.5
172.24.5.7   Ready    node                234d   v1.13.5

You can see my k8s version is 1.13.5, then i use eck (https://github.com/elastic/cloud-on-k8s, then run kubectl apply -f https://download.elastic.co/downloads/eck/1.0.0-beta1/all-in-one.yaml) to get an es cluster with 5 nodes:

NAME                               READY   STATUS    RESTARTS   AGE   IP              NODE         NOMINATED NODE   READINESS GATES
elasticsearch-sample-es-client-0   1/1     Running   0          18m   10.16.33.84     172.24.5.7   <none>           <none>
elasticsearch-sample-es-data-0     1/1     Running   0          18m   10.16.33.79     172.24.5.7   <none>           <none>
elasticsearch-sample-es-data-1     1/1     Running   0          18m   10.16.215.184   172.24.5.5   <none>           <none>
elasticsearch-sample-es-data-2     1/1     Running   0          18m   10.16.184.199   172.24.5.4   <none>           <none>
elasticsearch-sample-es-master-0   1/1     Running   0          18m   10.16.215.181   172.24.5.5   <none>           <none>

And in order to ensure the consistency of the test, I will ensure that the data nodes are distributed on three k8s nodes. The cr file of this es cluster is as follow:

apiVersion: elasticsearch.k8s.elastic.co/v1beta1
kind: Elasticsearch
metadata:
  name: elasticsearch-sample
  namespace: nes-elasticsearch
spec:
  version: 6.8.4
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  nodeSets:
  - name: master
    config:
      node.master: true
      node.data: false
      node.ingest: false
    podTemplate:
      spec:
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
        containers:
        - name: elasticsearch
          resources:
            requests:
              memory: 16Gi
              cpu: 8
            limits:
              memory: 16Gi
              cpu: 8
          env:
          - name: ES_JAVA_OPTS
            value: "-Xms4g -Xmx4g"
    count: 1
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 32Gi
  - name: client
    config:
      node.master: false
      node.data: false
      node.ingest: false
    podTemplate:
      spec:
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
        containers:
        - name: elasticsearch
          resources:
            requests:
              memory: 16Gi
              cpu: 8
            limits:
              memory: 16Gi
              cpu: 8
          env:
          - name: ES_JAVA_OPTS
            value: "-Xms4g -Xmx4g"
    count: 1
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 32Gi
  - name: data
    config:
      node.master: false
      node.data: true
      node.ingest: false
    podTemplate:
      spec:
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
        containers:
        - name: elasticsearch
          resources:
            requests:
              memory: 16Gi
              cpu: 8
            limits:
              memory: 16Gi
              cpu: 8
          env:
          - name: ES_JAVA_OPTS
            value: "-Xms4g -Xmx4g"
    count: 3
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 32Gi

You can see that in the above cr, the memory limit set by the memory request is the same, both are 16g. This is a very important parameter. This test is for memory request and memory limit. This is very important. and the pv file is as follow:


apiVersion: v1
kind: PersistentVolume
metadata:
  name: es-local-pv3
spec:
  capacity:
    storage: 32Gi
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  local:
    path: /home/data/eck-test1
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - 172.24.5.4

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: es-local-pv1
spec:
  capacity:
    storage: 32Gi
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  local:
    path: /home/data/eck-test1
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - 172.24.5.5
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: es-local-pv2
spec:
  capacity:
    storage: 32Gi
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  local:
    path: /home/data/eck-test1
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - 172.24.5.7

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: es-local-pv7
spec:
  capacity:
    storage: 32Gi
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  local:
    path: /home/data/eck-test2
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - 172.24.5.5

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: es-local-pv6
spec:
  capacity:
    storage: 32Gi
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  local:
    path: /home/data/eck-test2
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - 172.24.5.7

With the a bove file, you can quickly create an es cluster like mine. Then i use es-rally (https://github.com/elastic/rally) to test this cluster, i use the http_logs (https://github.com/elastic/rally-tracks/tree/master/http_logs) dataset and only test with the operation of index-append:

{
      "name": "index-append",
      "operation-type": "bulk",
      "bulk-size": {{bulk_size | default(5000)}},
      "ingest-percentage": {{ingest_percentage | default(100)}},
      "corpora": "http_logs"
}

The challenges is:

"schedule": [
        {
          "operation": "delete-index"
        },
        {
          "operation": {
            "operation-type": "create-index",
            "settings": {{index_settings | default({}) | tojson}}
          }
        },
        {
          "name": "check-cluster-health",
          "operation": {
            "operation-type": "cluster-health",
            "index": "logs-*",
            "request-params": {
              "wait_for_status": "{{cluster_health | default('green')}}",
              "wait_for_no_relocating_shards": "true"
            }
          }
        },
        {
          "operation": "index-append",
          "warmup-time-period": 240,
          "clients": {{bulk_indexing_clients | default(30)}}
        }
      ]

As you can see, I simplified the test by default, only testing the performance of the index. Also, I placed esrally inside the docker container at 172.24.5.3. Get the password of the es cluster first:

kubectl get secret elasticsearch-sample-es-elastic-user -n nes-elasticsearch -o=jsonpath='{.data.elastic}' | base64 --decode

then run the rally:

esrally --pipeline=benchmark-only --target-hosts=192.168.12.3:9200 --track=/rally/.rally/benchmarks/tracks/http_logs --report-format=csv --report-file=result.csv --challenge=append-no-conflicts --client-options="use_ssl:false,verify_certs:false,basic_auth_user:'elastic',basic_auth_password:'$PASSWORD'"

192.168.12.3 is the es http service's cluster ip. I got the test result as follow:

Metric,Task,Value,Unit
Cumulative indexing time of primary shards,,303.7706,min
Min cumulative indexing time across primary shards,,0.9421166666666667,min
Median cumulative indexing time across primary shards,,3.1037833333333333,min
Max cumulative indexing time across primary shards,,69.81788333333334,min
Cumulative indexing throttle time of primary shards,,0,min
Min cumulative indexing throttle time across primary shards,,0,min
Median cumulative indexing throttle time across primary shards,,0,min
Max cumulative indexing throttle time across primary shards,,0,min
Cumulative merge time of primary shards,,139.40831666666665,min
Cumulative merge count of primary shards,,3138,
Min cumulative merge time across primary shards,,0.09126666666666666,min
Median cumulative merge time across primary shards,,0.5575166666666667,min
Max cumulative merge time across primary shards,,26.99235,min
Cumulative merge throttle time of primary shards,,64.86913333333334,min
Min cumulative merge throttle time across primary shards,,0,min
Median cumulative merge throttle time across primary shards,,0.0576,min
Max cumulative merge throttle time across primary shards,,14.664250000000001,min
Cumulative refresh time of primary shards,,15.429016666666666,min
Cumulative refresh count of primary shards,,6023,
Min cumulative refresh time across primary shards,,0.06673333333333333,min
Median cumulative refresh time across primary shards,,0.14036666666666667,min
Max cumulative refresh time across primary shards,,2.721033333333333,min
Cumulative flush time of primary shards,,0.79705,min
Cumulative flush count of primary shards,,115,
Min cumulative flush time across primary shards,,0.00016666666666666666,min
Median cumulative flush time across primary shards,,0.00036666666666666667,min
Max cumulative flush time across primary shards,,0.24375,min
Total Young Gen GC,,236.31,s
Total Old Gen GC,,2.958,s
Store size,,22.122190072201192,GB
Translog size,,14.817421832121909,GB
Heap used for segments,,94.57985973358154,MB
Heap used for doc values,,0.1043548583984375,MB
Heap used for terms,,81.22084045410156,MB
Heap used for norms,,0.036376953125,MB
Heap used for points,,5.796648979187012,MB
Heap used for stored fields,,7.421638488769531,MB
Segment count,,596,
Min Throughput,index-append,170934.94,docs/s
Median Throughput,index-append,175795.68,docs/s
Max Throughput,index-append,182926.54,docs/s
50th percentile latency,index-append,852.5300654582679,ms
90th percentile latency,index-append,1073.6245419830084,ms
99th percentile latency,index-append,1436.844245232641,ms
99.9th percentile latency,index-append,3084.4296338940176,ms
99.99th percentile latency,index-append,3681.6509089201218,ms
100th percentile latency,index-append,4000.8082520216703,ms
50th percentile service time,index-append,852.5300654582679,ms
90th percentile service time,index-append,1073.6245419830084,ms
99th percentile service time,index-append,1436.844245232641,ms
99.9th percentile service time,index-append,3084.4296338940176,ms
99.99th percentile service time,index-append,3681.6509089201218,ms
100th percentile service time,index-append,4000.8082520216703,ms
error rate,index-append,0.00,%

This is a test result with a memory request and a memory limit of 16g. Then I changed cr and followed the steps above to re-create a new es cluster and test it.

 apiVersion: elasticsearch.k8s.elastic.co/v1beta1
    kind: Elasticsearch
    metadata:
      name: elasticsearch-sample
      namespace: nes-elasticsearch
    spec:
      version: 6.8.4
      http:
        tls:
          selfSignedCertificate:
            disabled: true
      nodeSets:
      - name: master
        config:
          node.master: true
          node.data: false
          node.ingest: false
        podTemplate:
          spec:
            initContainers:
            - name: sysctl
              securityContext:
                privileged: true
              command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
            containers:
            - name: elasticsearch
              resources:
                requests:
                  memory: 8Gi
                  cpu: 8
                limits:
                  memory: 16Gi
                  cpu: 8
              env:
              - name: ES_JAVA_OPTS
                value: "-Xms4g -Xmx4g"
        count: 1
        volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 32Gi
      - name: client
        config:
          node.master: false
          node.data: false
          node.ingest: false
        podTemplate:
          spec:
            initContainers:
            - name: sysctl
              securityContext:
                privileged: true
              command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
            containers:
            - name: elasticsearch
              resources:
                requests:
                  memory: 8Gi
                  cpu: 8
                limits:
                  memory: 16Gi
                  cpu: 8
              env:
              - name: ES_JAVA_OPTS
                value: "-Xms4g -Xmx4g"
        count: 1
        volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 32Gi
      - name: data
        config:
          node.master: false
          node.data: true
          node.ingest: false
        podTemplate:
          spec:
            initContainers:
            - name: sysctl
              securityContext:
                privileged: true
              command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
            containers:
            - name: elasticsearch
              resources:
                requests:
                  memory: 8Gi
                  cpu: 8
                limits:
                  memory: 16Gi
                  cpu: 8
              env:
              - name: ES_JAVA_OPTS
                value: "-Xms4g -Xmx4g"
        count: 3
        volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 32Gi

You can see that only the memory request is changed to half of the memory limit, that is, memory request 8g, memory limit 16g. In my opinion, the performance of this cluster should not be as good as the above cluster, but to my surprise, the results of this test are better than the results of the last test:

 Metric,Task,Value,Unit
Cumulative indexing time of primary shards,,250.88608333333335,min
Min cumulative indexing time across primary shards,,0.8436333333333333,min
Median cumulative indexing time across primary shards,,2.5911666666666666,min
Max cumulative indexing time across primary shards,,38.63725,min
Cumulative indexing throttle time of primary shards,,0,min
Min cumulative indexing throttle time across primary shards,,0,min
Median cumulative indexing throttle time across primary shards,,0,min
Max cumulative indexing throttle time across primary shards,,0,min
Cumulative merge time of primary shards,,108.69744999999999,min
Cumulative merge count of primary shards,,2561,
Min cumulative merge time across primary shards,,0.09261666666666667,min
Median cumulative merge time across primary shards,,0.5387666666666667,min
Max cumulative merge time across primary shards,,18.967783333333333,min
Cumulative merge throttle time of primary shards,,39.00376666666667,min
Min cumulative merge throttle time across primary shards,,0,min
Median cumulative merge throttle time across primary shards,,0.06693333333333333,min
Max cumulative merge throttle time across primary shards,,8.84365,min
Cumulative refresh time of primary shards,,11.112633333333333,min
Cumulative refresh count of primary shards,,4273,
Min cumulative refresh time across primary shards,,0.05848333333333333,min
Median cumulative refresh time across primary shards,,0.11405,min
Max cumulative refresh time across primary shards,,1.6635333333333333,min
Cumulative flush time of primary shards,,1.18685,min
Cumulative flush count of primary shards,,105,
Min cumulative flush time across primary shards,,0.00013333333333333334,min
Median cumulative flush time across primary shards,,0.00038333333333333334,min
Max cumulative flush time across primary shards,,0.34354999999999997,min
Total Young Gen GC,,163.803,s
Total Old Gen GC,,2.662,s
Store size,,22.243195474147797,GB
Translog size,,14.820328956469893,GB
Heap used for segments,,98.5560941696167,MB
Heap used for doc values,,0.13825225830078125,MB
Heap used for terms,,85.23204898834229,MB
Heap used for norms,,0.0382080078125,MB
Heap used for points,,5.784303665161133,MB
Heap used for stored fields,,7.36328125,MB
Segment count,,626,
Min Throughput,index-append,234070.03,docs/s
Median Throughput,index-append,249155.67,docs/s
Max Throughput,index-append,254879.21,docs/s
50th percentile latency,index-append,543.2271109893918,ms
90th percentile latency,index-append,711.716774106026,ms
99th percentile latency,index-append,1457.8318749740733,ms
99.9th percentile latency,index-append,3109.7604149319686,ms
99.99th percentile latency,index-append,4333.628856041162,ms
100th percentile latency,index-append,8061.395730823278,ms
50th percentile service time,index-append,543.2271109893918,ms
90th percentile service time,index-append,711.716774106026,ms
99th percentile service time,index-append,1457.8318749740733,ms
99.9th percentile service time,index-append,3109.7604149319686,ms
99.99th percentile service time,index-append,4333.628856041162,ms
100th percentile service time,index-append,8061.395730823278,ms
error rate,index-append,0.00,%

That is, memory request 8g memory limit 16g performs better than memory request 16g memory limit 16g. The worst performance of the former is also better than the latter under multiple tests. I have performed this test many times, so the test results are not accidental (https://github.com/elastic/cloud-on-k8s/issues/2402). Has anyone encountered the same problem, or am I doing something wrong?

Hi @ftyuuu,

does the underlying hosts have enough memory and CPU to handle the containers allocated to them?

I think the best place to start is with looking at tasks, hot_threads, cpu usage, IO usage etc. to get a picture of where the time is spent.

Hi @HenningAndersen,
First of all, thank you very much for your reply. These days I have monitored this test in more detail through promethues and / _nodes / stats (https://github.com/vvanholl/elasticsearch-prometheus-exporter), and compared the following indicators:

  1. container: container_network_transmit_bytes_total, container_network_receive_bytes_total, container_network_transmit_packets_total, container_network_receive_packets_total, container_fs_reads_bytes_total, container_fs_writes_bytes_total, container_cpu_system_seconds_total, container_cpu_user_seconds_total, container_cpu_usage_seconds_total, container_memory_usage_bytes, container_memory_working_set_bytes, container_memory_rss, container_memory_cache, container_memory_failures_total.

  2. es: es_os_cpu_percent, es_process_cpu_percent, es_os_load_average_one_minute, es_os_load_average_five_minute, es_jvm_mem_heap_used_percent, es_jvm_bufferpool_number, es_jvm_gc_collection_count, es_jvm_threads_number, es_os_mem_used_percent, es_os_mem_used_bytes, es_fs_io_total_read_bytes, es_fs_io_total_write_bytes, es_indices_refresh_total_count, es_indices_flush_total_count, es_indices_merge_total_count, es_transport_tx_bytes_count, es_transport_rx_bytes_count

All these indicators are monitored in grafana, and i have carried out many experiments.
This is one of them:
1.jvm 4g request 16g limit 16g

Metric,Task,Value,Unit

Cumulative indexing time of primary shards,,311.1318333333333,min

Min cumulative indexing time across primary shards,,0.7046333333333333,min

Median cumulative indexing time across primary shards,,2.6332166666666668,min

Max cumulative indexing time across primary shards,,56.22885,min

Cumulative indexing throttle time of primary shards,,0,min

Min cumulative indexing throttle time across primary shards,,0,min

Median cumulative indexing throttle time across primary shards,,0,min

Max cumulative indexing throttle time across primary shards,,0,min

Cumulative merge time of primary shards,,124.83886666666668,min

Cumulative merge count of primary shards,,2888,

Min cumulative merge time across primary shards,,0.08875000000000001,min

Median cumulative merge time across primary shards,,0.4968333333333333,min

Max cumulative merge time across primary shards,,23.507166666666667,min

Cumulative merge throttle time of primary shards,,52.10953333333334,min

Min cumulative merge throttle time across primary shards,,0,min

Median cumulative merge throttle time across primary shards,,0.061849999999999995,min

Max cumulative merge throttle time across primary shards,,10.831116666666667,min

Cumulative refresh time of primary shards,,12.843583333333333,min

Cumulative refresh count of primary shards,,5026,

Min cumulative refresh time across primary shards,,0.05001666666666667,min

Median cumulative refresh time across primary shards,,0.11185,min

Max cumulative refresh time across primary shards,,2.0578666666666665,min

Cumulative flush time of primary shards,,0.8113166666666667,min

Cumulative flush count of primary shards,,110,

Min cumulative flush time across primary shards,,0.00015,min

Median cumulative flush time across primary shards,,0.0005166666666666667,min

Max cumulative flush time across primary shards,,0.18055000000000002,min

Total Young Gen GC,,185.522,s

Total Old Gen GC,,3.66,s

Store size,,24.418405117467046,GB

Translog size,,14.818926102481782,GB

Heap used for segments,,101.23865032196045,MB

Heap used for doc values,,0.13006591796875,MB

Heap used for terms,,87.88238430023193,MB

Heap used for norms,,0.0372314453125,MB

Heap used for points,,5.793697357177734,MB

Heap used for stored fields,,7.395271301269531,MB

Segment count,,610,

Min Throughput,index-append,205678.38,docs/s

Median Throughput,index-append,211069.95,docs/s

Max Throughput,index-append,219601.17,docs/s

50th percentile latency,index-append,710.2141585201025,ms

90th percentile latency,index-append,909.8816037178041,ms

99th percentile latency,index-append,1108.3118677139275,ms

99.9th percentile latency,index-append,1260.5270746536557,ms

99.99th percentile latency,index-append,1487.3016146069629,ms

100th percentile latency,index-append,2166.5492579340935,ms

50th percentile service time,index-append,710.2141585201025,ms

90th percentile service time,index-append,909.8816037178041,ms

99th percentile service time,index-append,1108.3118677139275,ms

99.9th percentile service time,index-append,1260.5270746536557,ms

99.99th percentile service time,index-append,1487.3016146069629,ms

100th percentile service time,index-append,2166.5492579340935,ms

error rate,index-append,0.00,%

2.jvm 4g request 8g limit 16g

Metric,Task,Value,Unit

Cumulative indexing time of primary shards,,240.67658333333333,min

Min cumulative indexing time across primary shards,,0.6001833333333334,min

Median cumulative indexing time across primary shards,,2.1107833333333335,min

Max cumulative indexing time across primary shards,,40.6904,min

Cumulative indexing throttle time of primary shards,,0,min

Min cumulative indexing throttle time across primary shards,,0,min

Median cumulative indexing throttle time across primary shards,,0,min

Max cumulative indexing throttle time across primary shards,,0,min

Cumulative merge time of primary shards,,105.22695,min

Cumulative merge count of primary shards,,2435,

Min cumulative merge time across primary shards,,0.09413333333333333,min

Median cumulative merge time across primary shards,,0.4098833333333333,min

Max cumulative merge time across primary shards,,19.173883333333333,min

Cumulative merge throttle time of primary shards,,36.93658333333334,min

Min cumulative merge throttle time across primary shards,,0,min

Median cumulative merge throttle time across primary shards,,0.04183333333333333,min

Max cumulative merge throttle time across primary shards,,8.55415,min

Cumulative refresh time of primary shards,,10.127483333333334,min

Cumulative refresh count of primary shards,,3858,

Min cumulative refresh time across primary shards,,0.048799999999999996,min

Median cumulative refresh time across primary shards,,0.08568333333333333,min

Max cumulative refresh time across primary shards,,1.5930666666666666,min

Cumulative flush time of primary shards,,1.2482499999999999,min

Cumulative flush count of primary shards,,105,

Min cumulative flush time across primary shards,,0.00013333333333333334,min

Median cumulative flush time across primary shards,,0.00025,min

Max cumulative flush time across primary shards,,0.3468833333333333,min

Total Young Gen GC,,187.717,s

Total Old Gen GC,,3.953,s

Store size,,22.587846134789288,GB

Translog size,,14.754520618356764,GB

Heap used for segments,,96.63905620574951,MB

Heap used for doc values,,0.10878372192382812,MB

Heap used for terms,,83.39537239074707,MB

Heap used for norms,,0.03607177734375,MB

Heap used for points,,5.7862749099731445,MB

Heap used for stored fields,,7.312553405761719,MB

Segment count,,591,

Min Throughput,index-append,272140.45,docs/s

Median Throughput,index-append,282387.09,docs/s

Max Throughput,index-append,291776.26,docs/s

50th percentile latency,index-append,529.4152842834592,ms

90th percentile latency,index-append,713.4810378775001,ms

99th percentile latency,index-append,934.3890321440992,ms

99.9th percentile latency,index-append,1735.9961382858462,ms

99.99th percentile latency,index-append,2020.953842085554,ms

100th percentile latency,index-append,2322.4159302189946,ms

50th percentile service time,index-append,529.4152842834592,ms

90th percentile service time,index-append,713.4810378775001,ms

99th percentile service time,index-append,934.3890321440992,ms

99.9th percentile service time,index-append,1735.9961382858462,ms

99.99th percentile service time,index-append,2020.953842085554,ms

100th percentile service time,index-append,2322.4159302189946,ms

error rate,index-append,0.00,%

You can see that the experimental results of 2 are still better, like results I posted a few days ago.
I have a lot of confusion:

  1. I compared multiple indicators of the two tests. Most of the indicators have similar graphs, except for es cpu. In my k8s cluster, each node has 48 CPUs, and I assigned 8 CPUs to the pod, so the CPUs account for about 16% of the entire physical machine at full load. You can see that the CPU usage of the test2 (request 8g limit 16g) seems to be higher (this is true in many experiments).

  1. In addition, the refresh and merge times of Test 2 (request 8g limit 16g) are even smaller, which can also be seen in the test report of esrally.

  2. Below are icons for the remaining indicators:







My k8s cluster has enough cpu and memory resources, can anyone help me solve this problem?

Hi @ftyuuu,

I am not a k8s expert, but it is my understanding that the memory request is not actively used for anything but allocation of containers/nodes. Can you paste the allocation of containers/nodes in the two cases here?

Hi @HenningAndersen, in these tests, in order to ensure consistency, I let the data nodes be evenly distributed on 3 nodes, and then ensure that the client is on 172.24.5.7 and the master is on 172.24.5.4.

Hi @ftyuuu,

Thanks, that sounds good. I think I would like to see a _nodes and a _nodes/stats dump from the two setups to see if there is anything different in how ES sees the world.

@HenningAndersen Do you need _nodes and _nodes / stats information for the two clusters that are performing the tests, or information for the empty clusters that are not executing? If it is the former, I think the above chart can better show the status of _node / stats in the test.

@ftyuuu, let us just start with the two clusters empty, not executing anything.

Hi @HenningAndersen,
I retested and collected the _nodes, _nodes/stats, _cluster/settings, and _cluster/state infomations before and after the test. I put all the files on github. https://github.com/ftyuuu/k8s-on-es-issue

Hi @HenningAndersen , first of all, thank you very much for your selfless help. I found the cause of this problem. This is the CPU resource allocation problem caused by the k8s QoS policy. Its solution is https://github.com/kubernetes/kubernetes/issues/51135. You can also find this question on https://github.com/helm/charts/pull/10872.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.