Rally Summary Report and Kibana

Hi

I had run rally benchmarking for one search operation on existing index and got below results :

Lap Metric Task Value Unit
All Indexing time 274.278 min
All Merge time 398.326 min
All Refresh time 183.531 min
All Flush time 1.57647 min
All Merge throttle time 31.583 min
All Total Young Gen GC 20.896 s
All Total Old Gen GC 0.626 s
All Heap used for segments 274.622 MB
All Heap used for doc values 18.3885 MB
All Heap used for terms 208.191 MB
All Heap used for norms 9.35577 MB
All Heap used for points 6.70047 MB
All Heap used for stored fields 31.9862 MB
All Segment count 7117
All Min Throughput term 2581 ops/s
All Median Throughput term 2625.46 ops/s
All Max Throughput term 2893.21 ops/s
All 50th percentile latency term 9001.67 ms
All 90th percentile latency term 12855.2 ms
All 99th percentile latency term 13742.1 ms
All 99.9th percentile latency term 14159.6 ms
All 99.99th percentile latency term 14401.8 ms
All 100th percentile latency term 14481.5 ms
All 50th percentile service time term 174.25 ms
All 90th percentile service time term 321.109 ms
All 99th percentile service time term 580.613 ms
All 99.9th percentile service time term 798.671 ms
All 99.99th percentile service time term 864.014 ms
All 100th percentile service time term 905.346 ms
All error rate term 0 %

Kibana -

I can see throughput around 2800 op/s but when i tried to overview elastic performance in kibana , i couldn't relate the graph with above results.

As i understand , Throuhput is number of operations handled per second by elastic. Is there any relation between search rate and Throuhput ?

You drove the system clearly into saturation. There is a huge difference between service time and latency which means there is a lot of waiting involved. For more details please see the FAQ: What does latency and service_time mean and how do they related to the took field that Elasticsearch returns?. I suggest that you lower the target throughput. For more background see the article Relating Service Utilisation to Latency which is a great read IMHO.

Secondly, Monitoring is showing you a different (more coarse-grained) granularity than Rally and it also cannot have a clue about the service time characteristics observed by a client as it can only observe server-side behavior. Contrary, Rally measures end-to-end service time as observed by a client (including time spent waiting in the search thread pool's queue and potential network delays).

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.