Large time variance between tests on the same system running the same track

I’ve been running rally multiple times on the same system as part of a benchmarking project, but I keep seeing huge variances between how long each run takes. How can I debug this further to understand what’s causing such a wide range of times?

I’m running the cohere_vector track.

[INFO] SUCCESS (took 11748 seconds)

[INFO] SUCCESS (took 30335 seconds)

[INFO] SUCCESS (took 22321 seconds)

Hello!

with only the total run time, its tricky to say exactly why - looking at the track there is a wait until merges complete stage - i wonder if this is possibly the cause - the time seems quite excessive though since the second run seems to have taken 5 hours longer than the first. The merges that are looked at are all across the cluster - if you are using the cluster for other things at the same time, its possible they could be impacting things.

You should be able to see in the output, as well as the metrics which steps are taking so long - take a look at the logs and let us know what you find.

Thanks

Gareth