I'm experiencing an es on k8s performance issue that confuses me, and this is my test environment. First, I have a k8s cluster with 4 nodes, of which 1 master and 3 nodes:
NAME STATUS ROLES AGE VERSION
172.24.5.3 Ready master,monitoring 234d v1.13.5
172.24.5.4 Ready monitoring,node 234d v1.13.5
172.24.5.5 Ready node 234d v1.13.5
172.24.5.7 Ready node 234d v1.13.5
You can see my k8s version is 1.13.5, then i use eck (https://github.com/elastic/cloud-on-k8s, then run kubectl apply -f https://download.elastic.co/downloads/eck/1.0.0-beta1/all-in-one.yaml) to get an es cluster with 5 nodes:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
elasticsearch-sample-es-client-0 1/1 Running 0 18m 10.16.33.84 172.24.5.7 <none> <none>
elasticsearch-sample-es-data-0 1/1 Running 0 18m 10.16.33.79 172.24.5.7 <none> <none>
elasticsearch-sample-es-data-1 1/1 Running 0 18m 10.16.215.184 172.24.5.5 <none> <none>
elasticsearch-sample-es-data-2 1/1 Running 0 18m 10.16.184.199 172.24.5.4 <none> <none>
elasticsearch-sample-es-master-0 1/1 Running 0 18m 10.16.215.181 172.24.5.5 <none> <none>
And in order to ensure the consistency of the test, I will ensure that the data nodes are distributed on three k8s nodes. The cr file of this es cluster is as follow:
apiVersion: elasticsearch.k8s.elastic.co/v1beta1
kind: Elasticsearch
metadata:
name: elasticsearch-sample
namespace: nes-elasticsearch
spec:
version: 6.8.4
http:
tls:
selfSignedCertificate:
disabled: true
nodeSets:
- name: master
config:
node.master: true
node.data: false
node.ingest: false
podTemplate:
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
containers:
- name: elasticsearch
resources:
requests:
memory: 16Gi
cpu: 8
limits:
memory: 16Gi
cpu: 8
env:
- name: ES_JAVA_OPTS
value: "-Xms4g -Xmx4g"
count: 1
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 32Gi
- name: client
config:
node.master: false
node.data: false
node.ingest: false
podTemplate:
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
containers:
- name: elasticsearch
resources:
requests:
memory: 16Gi
cpu: 8
limits:
memory: 16Gi
cpu: 8
env:
- name: ES_JAVA_OPTS
value: "-Xms4g -Xmx4g"
count: 1
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 32Gi
- name: data
config:
node.master: false
node.data: true
node.ingest: false
podTemplate:
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
containers:
- name: elasticsearch
resources:
requests:
memory: 16Gi
cpu: 8
limits:
memory: 16Gi
cpu: 8
env:
- name: ES_JAVA_OPTS
value: "-Xms4g -Xmx4g"
count: 3
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 32Gi
You can see that in the above cr, the memory limit set by the memory request is the same, both are 16g. This is a very important parameter. This test is for memory request and memory limit. This is very important. and the pv file is as follow:
apiVersion: v1
kind: PersistentVolume
metadata:
name: es-local-pv3
spec:
capacity:
storage: 32Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
local:
path: /home/data/eck-test1
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 172.24.5.4
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: es-local-pv1
spec:
capacity:
storage: 32Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
local:
path: /home/data/eck-test1
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 172.24.5.5
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: es-local-pv2
spec:
capacity:
storage: 32Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
local:
path: /home/data/eck-test1
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 172.24.5.7
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: es-local-pv7
spec:
capacity:
storage: 32Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
local:
path: /home/data/eck-test2
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 172.24.5.5
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: es-local-pv6
spec:
capacity:
storage: 32Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
local:
path: /home/data/eck-test2
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 172.24.5.7
With the a bove file, you can quickly create an es cluster like mine. Then i use es-rally (https://github.com/elastic/rally) to test this cluster, i use the http_logs (https://github.com/elastic/rally-tracks/tree/master/http_logs) dataset and only test with the operation of index-append:
{
"name": "index-append",
"operation-type": "bulk",
"bulk-size": {{bulk_size | default(5000)}},
"ingest-percentage": {{ingest_percentage | default(100)}},
"corpora": "http_logs"
}
The challenges is:
"schedule": [
{
"operation": "delete-index"
},
{
"operation": {
"operation-type": "create-index",
"settings": {{index_settings | default({}) | tojson}}
}
},
{
"name": "check-cluster-health",
"operation": {
"operation-type": "cluster-health",
"index": "logs-*",
"request-params": {
"wait_for_status": "{{cluster_health | default('green')}}",
"wait_for_no_relocating_shards": "true"
}
}
},
{
"operation": "index-append",
"warmup-time-period": 240,
"clients": {{bulk_indexing_clients | default(30)}}
}
]
As you can see, I simplified the test by default, only testing the performance of the index. Also, I placed esrally inside the docker container at 172.24.5.3. Get the password of the es cluster first:
kubectl get secret elasticsearch-sample-es-elastic-user -n nes-elasticsearch -o=jsonpath='{.data.elastic}' | base64 --decode
then run the rally:
esrally --pipeline=benchmark-only --target-hosts=192.168.12.3:9200 --track=/rally/.rally/benchmarks/tracks/http_logs --report-format=csv --report-file=result.csv --challenge=append-no-conflicts --client-options="use_ssl:false,verify_certs:false,basic_auth_user:'elastic',basic_auth_password:'$PASSWORD'"
192.168.12.3 is the es http service's cluster ip. I got the test result as follow:
Metric,Task,Value,Unit
Cumulative indexing time of primary shards,,303.7706,min
Min cumulative indexing time across primary shards,,0.9421166666666667,min
Median cumulative indexing time across primary shards,,3.1037833333333333,min
Max cumulative indexing time across primary shards,,69.81788333333334,min
Cumulative indexing throttle time of primary shards,,0,min
Min cumulative indexing throttle time across primary shards,,0,min
Median cumulative indexing throttle time across primary shards,,0,min
Max cumulative indexing throttle time across primary shards,,0,min
Cumulative merge time of primary shards,,139.40831666666665,min
Cumulative merge count of primary shards,,3138,
Min cumulative merge time across primary shards,,0.09126666666666666,min
Median cumulative merge time across primary shards,,0.5575166666666667,min
Max cumulative merge time across primary shards,,26.99235,min
Cumulative merge throttle time of primary shards,,64.86913333333334,min
Min cumulative merge throttle time across primary shards,,0,min
Median cumulative merge throttle time across primary shards,,0.0576,min
Max cumulative merge throttle time across primary shards,,14.664250000000001,min
Cumulative refresh time of primary shards,,15.429016666666666,min
Cumulative refresh count of primary shards,,6023,
Min cumulative refresh time across primary shards,,0.06673333333333333,min
Median cumulative refresh time across primary shards,,0.14036666666666667,min
Max cumulative refresh time across primary shards,,2.721033333333333,min
Cumulative flush time of primary shards,,0.79705,min
Cumulative flush count of primary shards,,115,
Min cumulative flush time across primary shards,,0.00016666666666666666,min
Median cumulative flush time across primary shards,,0.00036666666666666667,min
Max cumulative flush time across primary shards,,0.24375,min
Total Young Gen GC,,236.31,s
Total Old Gen GC,,2.958,s
Store size,,22.122190072201192,GB
Translog size,,14.817421832121909,GB
Heap used for segments,,94.57985973358154,MB
Heap used for doc values,,0.1043548583984375,MB
Heap used for terms,,81.22084045410156,MB
Heap used for norms,,0.036376953125,MB
Heap used for points,,5.796648979187012,MB
Heap used for stored fields,,7.421638488769531,MB
Segment count,,596,
Min Throughput,index-append,170934.94,docs/s
Median Throughput,index-append,175795.68,docs/s
Max Throughput,index-append,182926.54,docs/s
50th percentile latency,index-append,852.5300654582679,ms
90th percentile latency,index-append,1073.6245419830084,ms
99th percentile latency,index-append,1436.844245232641,ms
99.9th percentile latency,index-append,3084.4296338940176,ms
99.99th percentile latency,index-append,3681.6509089201218,ms
100th percentile latency,index-append,4000.8082520216703,ms
50th percentile service time,index-append,852.5300654582679,ms
90th percentile service time,index-append,1073.6245419830084,ms
99th percentile service time,index-append,1436.844245232641,ms
99.9th percentile service time,index-append,3084.4296338940176,ms
99.99th percentile service time,index-append,3681.6509089201218,ms
100th percentile service time,index-append,4000.8082520216703,ms
error rate,index-append,0.00,%