I’ve been running rally multiple times on the same system as part of a benchmarking project, but I keep seeing huge variances between how long each run takes. How can I debug this further to understand what’s causing such a wide range of times?
with only the total run time, its tricky to say exactly why - looking at the track there is a wait until merges complete stage - i wonder if this is possibly the cause - the time seems quite excessive though since the second run seems to have taken 5 hours longer than the first. The merges that are looked at are all across the cluster - if you are using the cluster for other things at the same time, its possible they could be impacting things.
You should be able to see in the output, as well as the metrics which steps are taking so long - take a look at the logs and let us know what you find.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.