Can't reach Rally Target throughput

Hi,

I have a 4 node cluster setup. I'm benchmarking the setup using Rally. I have set the no. of "client" for a search operation in "track.json" to 12 with "target-throughput" of 2500. But the rally reported that it has performed only 2000 odd no. of operation against the set target of 2500 req/sec. I have increased the n0. of clients to 18, 20, 24, 30 and so on but the reported no. of operation for the search had not gone beyond 2000 req/sec.

Is this due to the limitation of server(co-ordinator) or rally or cluster setup?

I'm running benchmark co-ordinator on 4 core CPU, 16GB memory instance.

Thanks,
Sunil

It is difficult to determine what is limiting throughput with seeing what the load is on the cluster and the Rally node. Is CPU saturated on the Rally node when you read 2000 QPS? What does CPU usage on the Elasticsearch cluster look like? Depending on how much data each query is returning, it could also be that network performance is a limiting factor on the Rally node.

I have to check the CPU usage. In mean time, does network bandwidth cause huge latency in the metrics? Because when i check the latency for target throughput 2000 req/sec it is around 4000msec but for 1000req/sec its 3-4msec.

Look at service time in the results and see if this varies with throughput. It is also possible that you are hitting the limit of what your cluster can handle. Do you have monitoring installed?

Service time is not changing, it is always around 4 - 5ms irrespective of the set target throughput but latency is huge (order of seconds)

Hi,

I assume that you are not talking about the 50th percentile but rather the 100th percentile (i.e. maximum). You should start with a target throughput of the inverse of that, so maybe at most 250 ops/s (1 operation / 5ms), lower that a bit to maybe 200 ops/s and gradually increase throughput from there. If latency is much higher than service time that indicates the benchmark does not reach a steady state and the system in that specific setup cannot cope with the throughput you are aiming for. See also our FAQ for more info. I'd also suggest to start with a single client first and then gradually increase the number of clients.

Daniel

Wrt checking the load, network utilization and IO usage on your load generating machine, I recommend going over https://medium.com/netflix-techblog/linux-performance-analysis-in-60-000-milliseconds-accc10403c55 which I believe is based on Brendan Gregg's USE Method/Performance checklist.

@danielmitterdorfer sure will try that. Thanks :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.