We were seeing some real slowness in the queries in our new ES 6.1.2 cluster. In order to understand more, we started running Rally Geoname test bed to get some performance benchmark of the cluster
The test results shows the querying is pretty slow.
Here is our test configuration that is run on AWS
|Elastic Search Version|6.1.2|
Master Node = 3 (r4.large)
Data Node = 3 (i3.2xlarge)
|Front Node = 1 (r4.2xlarge) == rally host
Rally Test Car - Geonames
java version "1.8.0_161"
Summary/Our Analysis of the data
- We ran the same rally tests on the 2.4 ES version legacy 3 data node cluster using the same hardware in AWS, Our 2.x rally tests performed way better than 6.x cluster
- We also ran the same rally tests on 1 data node 6.1.2 cluster. We found the performance numbers in 1 data node 6.1.2 cluster is way better than 3 data node 6.1.2 cluster, but poorer than 2.4 cluster
- For the country_aggregrate uncached numbers, there is a huge difference between latency and service time. We checked the CPU utilization and all the system metrics. CPU utilization is hovering only around 40-50%.
- We have run these tests multiple time over the last 1 week and found the results from rally are consistent.
- We are running the basic configuration with nothing much changed in the elastic search configuration for the 6.1.2 cluster.
I am attaching the part of the test results.
Any pointers or thoughts on what might be going on in our cluster. We are migrating our users from 2.4 to 6.1.2 and want to get a good handle before we roll everyone to new cluster and shut down the old cluster.