I have a cluster setup with the objective to test elasticsearch nodes using rally. As of now I have a benchmark coordinator, 2 load drivers and 3 elasticsearch nodes. I wish to maximize the load I can place upon these 3 nodes. Looking for recommendation on how to setup the cluster to have peak throughput and CPU utilization.
This question is a bit open ended without a definition of load (ingest? search?), other parameters like architecture, mappings etc. and understanding of what you want to achieve by benchmarking.
I am running performance benchmarks using rally. I am running my cluster on AWS M5a.4xlarge. I wish to apply load from the default rally races(currently using geopoint) on my elasticsearch nodes to maximize CPU util and throughput(docs/s). This exercise is simply to test elasticsearch on this AWS instance to see how well it performs.
The setup consists of the following
1 - benchmark coordinator
2 - load drivers
& 1 to 4 - Elasticsearch nodes.
Aim - To maximize cpu utilization and throughput as I scale up from one node to four.
Problem - Unable to increase CPU utilization of elasticsearch nodes..
Elasticsearch performance and throughput is generally not necessarily limited by CPU, but rather disk I/O. What type and size of EBS storage do you have on your nodes? Have you monitored disk utilization and iowait while running the benchmark to see if this might be limiting performance?
All systems are M5a.4xlarge including rally. 16 cores, 64 GB RAM with 60000 PIOPS. iowait doesn't go higher than ~1% and the CPU utilization is roughly at 70%.
What indexing rate are you seeing? Which track are you using? How many clients is Rally using?
For nodes that powerful a lot of the default tracks are quite small. You may want to try the rally-eventdata-track which generates data on the fly. It is more CPU intensive but can be run for long periods of time at reasonably high concurrency without resetting.
As @Christian_Dahlqvist mentioned, typically it's IO that takes a bigger hit than cpu for ingest and search.
With ingestion, when you are certain that the load driver isn't the bottleneck, apart from tweaking the number of clients you can experiment with increasing the bulk_size (which for the geopoint track you are currently using is 5000 by default). Higher throughput (and load) can be achieved by increasing the number of shards; we've recently had a 3 node setup with another track and trying to explain the modest indexing throughput we were able to get higher results by increasing the number of primary shards.
Note that the geopoint track you are using has a rather small corpus, from the default tracks with pre-generated data you could try the nyc_taxis track that has a large corpus and uses a bigger bulk size by default.
Finally, depending on the use case, the default compression can be changed to best_compression (see index.codec setting); this is for example what the eventdata track uses. Best compression will also increase cpu usage and can be attractive for logging use cases where disk costs need to be kept under control.
Many of the common pitfalls and topics touched here are discussed in the links I mentioned in my previous comment.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.