How to Calculate Throughput(Ops/sec) from ESRally Report.md

Hi All,
I am running 232 queries per minute through ESRally and also having 232 clients.

Below are few of my report.md file contents.

All,Min Throughput,custom_simple_query_228,0.13,ops/s
All,Median Throughput,custom_simple_query_228,0.17,ops/s
All,Max Throughput,custom_simple_query_228,0.17,ops/s
All,50th percentile latency,custom_simple_query_228,254735.82665342838,ms
All,90th percentile latency,custom_simple_query_228,453923.8566603046,ms
All,99th percentile latency,custom_simple_query_228,498532.68128809053,ms
All,100th percentile latency,custom_simple_query_228,502637.6438322477,ms
All,50th percentile service time,custom_simple_query_228,5885.677504120395,ms
All,90th percentile service time,custom_simple_query_228,6185.520075401291,ms
All,99th percentile service time,custom_simple_query_228,6512.958331475965,ms
All,100th percentile service time,custom_simple_query_228,7637.462149839848,ms
All,error rate,custom_simple_query_228,0.00,%
All,Min Throughput,custom_simple_query_229,0.13,ops/s
All,Median Throughput,custom_simple_query_229,0.16,ops/s
All,Max Throughput,custom_simple_query_229,0.16,ops/s
All,50th percentile latency,custom_simple_query_229,253916.0132848192,ms
All,90th percentile latency,custom_simple_query_229,453343.4856909327,ms
All,100th percentile latency,custom_simple_query_229,503350.4834943451,ms
All,50th percentile service time,custom_simple_query_229,6121.4104143437,ms
All,90th percentile service time,custom_simple_query_229,6363.997411495075,ms
All,100th percentile service time,custom_simple_query_229,7758.432420901954,ms
All,error rate,custom_simple_query_229,0.00,%
All,Min Throughput,custom_simple_query_230,0.11,ops/s
All,Median Throughput,custom_simple_query_230,0.13,ops/s
All,Max Throughput,custom_simple_query_230,0.14,ops/s
All,50th percentile latency,custom_simple_query_230,263817.0460090041,ms
All,90th percentile latency,custom_simple_query_230,469282.91428880766,ms
All,100th percentile latency,custom_simple_query_230,520855.9247748926,ms
All,50th percentile service time,custom_simple_query_230,7394.637248013169,ms
All,90th percentile service time,custom_simple_query_230,7630.584071855992,ms
All,100th percentile service time,custom_simple_query_230,8723.47238380462,ms
All,error rate,custom_simple_query_230,0.00,%
All,Min Throughput,custom_simple_query_231,0.13,ops/s
All,Median Throughput,custom_simple_query_231,0.17,ops/s
All,Max Throughput,custom_simple_query_231,0.18,ops/s
All,50th percentile latency,custom_simple_query_231,250339.09847494215,ms
All,90th percentile latency,custom_simple_query_231,446924.7991587036,ms
All,99th percentile latency,custom_simple_query_231,491856.99790116394,ms
All,100th percentile latency,custom_simple_query_231,496650.9179570712,ms
All,50th percentile service time,custom_simple_query_231,5686.616131104529,ms
All,90th percentile service time,custom_simple_query_231,6000.503335334361,ms
All,99th percentile service time,custom_simple_query_231,6214.789305739104,ms
All,100th percentile service time,custom_simple_query_231,7654.282569885254,ms
All,error rate,custom_simple_query_231,0.00,%
All,Min Throughput,custom_simple_query_232,0.12,ops/s
All,Median Throughput,custom_simple_query_232,0.14,ops/s
All,Max Throughput,custom_simple_query_232,0.14,ops/s
All,50th percentile latency,custom_simple_query_232,263251.6680445988,ms
All,90th percentile latency,custom_simple_query_232,468660.6013576966,ms
All,100th percentile latency,custom_simple_query_232,518969.1438791342,ms
All,50th percentile service time,custom_simple_query_232,6980.2779571618885,ms
All,90th percentile service time,custom_simple_query_232,7212.950173066929,ms
All,100th percentile service time,custom_simple_query_232,8547.225694172084,ms
All,error rate,custom_simple_query_232,0.00,%

So, how to calculate overall Search queries Ops/Sec ?

--Regards,
Balmukund

Hi,

If I understand you correctly you run multiple queries (in parallel?) and want to calculate a single throughput metric for all queries together (e.g. throughput across all queries: X ops/s). While it would be (theoretically) possible to calculate this (see Rally's source code how it's done on per-task basis), I'd be interested to hear how you'd use that metric.

Daniel

Hi Daniel,
Thank you very much for your quick response. Yes, you are right, i'm running multiple queries in parallel.
Suppose,
Query q1 have ops per second p1,
Query q2 have ops per second p2,
Query q3 have ops per second p3,

So, currently,i am calculating it by just adding total ops/sec as (p1+p2+p3).
So, just wanted your confirmation whether, its correct or not?

--Regards,
Balmukund

Hi Balmukund,

the number you're calculating is based on summary statistics. If you add up e.g. the maximum throughput of all queries it is very likely that you overestimate what the system can achieve because that maximum for each individual query could be reached at different points in time during the benchmark. I also noticed that the difference between service time and latency is very high in some cases and this indicates that your benchmark is not in a stable state, i.e. your target throughput is too high (see our FAQ and the workload section of our blog post Seven Tips for Better Elasticsearch Benchmarks for details).

In any case, to get an accurate picture I think you'd need to calculate the achieved throughput based on the raw samples. I don't know what these queries represent but I imagine if they are issued by your application to process a single customer request then it would be better to write a custom runner that executes all the operations that your application executes in the same order. This would be more realistic and also Rally would automatically show the correct metrics already for the high-level operation you're interested in. An alternative to this approach would be to benchmark your application directly (e.g. with JMeter or other load testing tools) as this would provide you with end-to-end metrics.

Daniel

Hi Daniel,
Thank you very much for your response. You are right, if i consider the maximum throughput of all queries it is very likely that we overestimate what the system can achieve. Hence, i am using All,Median Throughput.
Also, I am using Rally's 1 Billions track to test the system for CPU, Memory and Disk IO Utilization.
Below is my simple query:
"query": {
"match" : {
"nginx.access.geoip.city_name": "Frankfort"
}
}

Also, Average response time calculating by summing all the value of "All,99th percentile service time" and averaging it.
i.e. If Query 1 has All,99th percentile service time as t1
Query 2 has All,99th percentile service time as t2
So, Average Response time = (t1+t2)/2;

Please, let me know if my calculation is wrong.

--Regards,
Balmukund

Hi,

I chose the maximum as an example where it is quite clear that you might overestimate the system's true capabilities but a similar reasoning applies to all summary statistics. I've provided an alternative to that in my earlier answer.

I guess by "average response time" you mean "average 99th percentile service time"? Unfortunately, percentiles (as well as minimum, median and maximum) only make sense in the context of the measurements for which they have been taken and you cannot aggregate them. You have two ways out of this: Calculate your summary statistics based on the raw samples that you'll find in the index rally-metrics-* in your Elasticsearch metrics store or use the mean value in rally-results-* that is available since Rally 1.1.0. However, I don't think that you should summarize results of two different queries. Let me provide an analogy: Say, it takes a truck to drive 10 hours from A to B and for the same distance it takes a sports car 4 hours. Averaging the times of the sports car and the truck gives (10 + 4)/2 = 7 hours. I am not sure how calculating this number helps you?

A great video is How NOT to measure latency which provides a lot of details what can go wrong when measuring latency and how to do it right.

Daniel

Hi Daniel,
sorry for the delayed response. Thank you very much for your response. I understood, calculating average by taking the average of 99th percentile is not making more sense. But I'm unable to see the rally-metrics-* or rally-results-* files.
It would be great if you could provide me the path of these files.

--Regards,
Balmukund

Hi,

these are not files but index patterns that match indices that are created by Rally when you setup a dedicated Elasticsearch metrics store. See also our documentation about metrics records.

Daniel