I would like to benchmark high volume requests using esrally. I'm trying to max somewhere between 500k-1M RPM. I've modified an existing challenge and changed the client and target-throughput values to match 500k RPM with a few thousand clients. However, esrally seems to hang and I don't see traffic coming through on my cluster.
I don't think I'm using the best approach to accomplish this. Appreciate any advice.
Actually it appears when doing benchmark-only pipeline at any target-throughput rate, I only see 1 request coming through in my data node monitor. Should I not see number of requests coming in corresponding to the target-throughput rate? What am I doing wrong?
What is the specification of the cluster you are going to benchmark? What type of data do you have? How much data do you have? What are your requirements around query latency?
Rally spins up a process per connection, so if you try to set up a very large number of clients on a single host it is going to be inefficient. I would recommend starting low and gradually increasing the query concurrency as long as you are still meeting your latency requirements. That way you will see how many concurrent queries and queries per second your cluster can handle. If you have a powerful cluster it is possible that you at some point might have to start running Rally is distributed mode though in order to cope with a large number of concurrent queries.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.