Can't reach Rally Target throughput

Sunil_pandith · September 14, 2018, 11:45am

Hi,

I have a 4 node cluster setup. I'm benchmarking the setup using Rally. I have set the no. of "client" for a search operation in "track.json" to 12 with "target-throughput" of 2500. But the rally reported that it has performed only 2000 odd no. of operation against the set target of 2500 req/sec. I have increased the n0. of clients to 18, 20, 24, 30 and so on but the reported no. of operation for the search had not gone beyond 2000 req/sec.

Is this due to the limitation of server(co-ordinator) or rally or cluster setup?

I'm running benchmark co-ordinator on 4 core CPU, 16GB memory instance.

Thanks,
Sunil

Christian_Dahlqvist · September 14, 2018, 12:24pm

It is difficult to determine what is limiting throughput with seeing what the load is on the cluster and the Rally node. Is CPU saturated on the Rally node when you read 2000 QPS? What does CPU usage on the Elasticsearch cluster look like? Depending on how much data each query is returning, it could also be that network performance is a limiting factor on the Rally node.

Sunil_pandith · September 14, 2018, 3:08pm

I have to check the CPU usage. In mean time, does network bandwidth cause huge latency in the metrics? Because when i check the latency for target throughput 2000 req/sec it is around 4000msec but for 1000req/sec its 3-4msec.

Christian_Dahlqvist · September 14, 2018, 3:14pm

Look at service time in the results and see if this varies with throughput. It is also possible that you are hitting the limit of what your cluster can handle. Do you have monitoring installed?

Sunil_pandith · September 17, 2018, 6:09am

Service time is not changing, it is always around 4 - 5ms irrespective of the set target throughput but latency is huge (order of seconds)

danielmitterdorfer · September 17, 2018, 10:14am

Hi,

I assume that you are not talking about the 50th percentile but rather the 100th percentile (i.e. maximum). You should start with a target throughput of the inverse of that, so maybe at most 250 ops/s (1 operation / 5ms), lower that a bit to maybe 200 ops/s and gradually increase throughput from there. If latency is much higher than service time that indicates the benchmark does not reach a steady state and the system in that specific setup cannot cope with the throughput you are aiming for. See also our FAQ for more info. I'd also suggest to start with a single client first and then gradually increase the number of clients.

Daniel

dliappis · September 17, 2018, 10:33am

Wrt checking the load, network utilization and IO usage on your load generating machine, I recommend going over https://medium.com/netflix-techblog/linux-performance-analysis-in-60-000-milliseconds-accc10403c55 which I believe is based on Brendan Gregg's USE Method/Performance checklist.

Sunil_pandith · September 17, 2018, 11:23am

@danielmitterdorfer sure will try that. Thanks

system · October 15, 2018, 11:23am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Not able to hit target throughput Elasticsearch rally	4	947	October 18, 2017
Benchmarking High Volumes Elasticsearch rally	2	507	May 11, 2019
A question for result benchmark Elasticsearch rally	2	792	March 27, 2017
About the Max Throughput VS target throughtput Elasticsearch rally	5	740	October 29, 2018
Internal rally queue in throughput-throttled mode Elasticsearch rally	3	377	October 10, 2022

Can't reach Rally Target throughput

Related topics