Hi,
i have configured an Elasticsearch 6.2.4 cluster on Kubernetes in GCP.
Kubernetes:
9 nodes in west3a,b,c
24 CPU's and 45 GB RAM per node
Elasticsearch:
3 Master nodes each with 2 CPU's and 4 GB RAM
2 Coordination nodes each with 2 CPU's and 4 GB RAM
3 Data nodes each with 4 CPU's and 16 GB RAM and standard disks
From another pod in the same Kubernetes cluster i run a single esrally (0.11.0 on Ubuntu 18.04) instance.
esrally --track=http_logs --target-hosts=es-coord.database.svc.cluster.local:443 --pipeline=benchmark-only --client-options="use_ssl:true,verify_certs:false,basic_auth_user:'user',basic_auth_password:'pass'"
I get 148587 documents/sec and 8 default queries/sec with 196ms latency. This is not bad, but no matter what i do i don't see a linear Elasticsearch scaling.
6 data nodes: 180000 documents/sec
9 data nodes: 196000 documents/sec
3 data nodes with ssd: 152000 documents/sec
6 data nodes with ssd: 202000 documents/sec
9 data nodes with ssd: 203000 documents/sec
3 data nodes only in west3a: 145000 documents/sec
6 data nodes only in west3a: 187000 documents/sec
9 data nodes only in west3a: 182000 documents/sec
9 data nodes only in west3a on small K8s nodes: 179000 documents/sec
I also have tried to use the Coordination nodes IP's for target-hosts
instead of the K8s service and i have also tried to use more Coordination nodes. Also --track-params="clients:64"
and more esrally daemons has shown absolutely no difference. We use the xpack metrics and metricbeat for the containers, but i don't see a bottleneck. All involved processes are not that busy. The only suspicious thing is that i don't see more than 150Mbit/sec network throughput, but this should not affect the query results.
Do you have any ideas where the bottleneck could be?