You are not the only one that is confused. The vast majority of load testing tools get latency wrong - and ab
is one of them (nothing against ab
, it is a fine tool but you need to be aware of the difference). What ab
calls "latency" is actually "service time". And to be 100% precise, if you do not throttle throughput, then "latency" == "service time". But measuring query latency that way is wrong to begin with (for the very reason detailed in Relating Service Utilisation to Latency).
Long story short: If you want to compare the numbers you get from ab
with the numbers you get from Rally, you need to compare ab
's latency numbers to Rally's service time
numbers. But then you also need to run a comparable benchmark with Rally, which means:
- 50 concurrent clients
- 500 iterations
- No throughput throttling
But IMHO the more important point is to measure query latency correctly.
This is the same problem again. Let me explain that with an analogy:
Suppose you are barista in a coffee shop. Let's assume it takes you one minute to prepare a coffee. This is the time your server is busy servicing a customer's request and this is what we call service time. If less than one customer per minute enters your coffee shop, you are less than 100% utilized. Thus: Every customer can order their coffee immediately (i.e. there is no waiting line).
Now suppose, two customers per minute enter the coffee shop (2 ops/minute). The "problem" is that it takes you still one minute to prepare one coffee, i.e. customers inject too much load into the system (your maximum throughput is 1 op/minute). What happens? A waiting line will build up. And latency is telling you exactly this fact: It takes the waiting time of customers into account. If you just look at the service time, everything is "fine": it still takes you one minute to prepare the coffee no matter how many customers enter. But as customers enter the coffee shop twice as fast as you can service them, the waiting line will grow and grow and thus will latency.
The take away is: You need to reduce the target throughput to a level that is sustainable for the system, i.e. latency and service time should be close. In your case I'd guess that this is somewhere between 20 ops/s to 25 ops/s (but you need to measure this).
These numbers looks fine indeed.