Why the percentile latency is several times more than service time


(A943013827) #1

as follow:
34 | All | Min Throughput | phrase | 109.962 | ops/s |
35 | All | Median Throughput | phrase | 122.17 | ops/s |
36 | All | Max Throughput | phrase | 185.003 | ops/s |
37 | All | 50.0th percentile latency | phrase | 1.52168e+06 | ms |
38 | All | 90.0th percentile latency | phrase | 3.51342e+06 | ms |
39 | All | 99.0th percentile latency | phrase | 4.04594e+06 | ms |
40 | All | 99.9th percentile latency | phrase | 4.0922e+06 | ms |
41 | All | 99.99th percentile latency | phrase | 4.09765e+06 | ms |
42 | All | 100th percentile latency | phrase | 4.09826e+06 | ms |
43 | All | 50.0th percentile service time | phrase | 9.77154 | ms |
44 | All | 90.0th percentile service time | phrase | 42.1382 | ms |
45 | All | 99.0th percentile service time | phrase | 100.384 | ms |
46 | All | 99.9th percentile service time | phrase | 203.096 | ms |
47 | All | 99.99th percentile service time | phrase | 479.011 | ms |
48 | All | 100th percentile service time | phrase | 1097.27 | ms |

I ran rally with phrase-searching and index-append in parallel.
But the percentile latency is several times more than service time ,what might be the cause?


ES 6.1.2 Cluster shows performance bottleneck
(Daniel Mitterdorfer) #2

Hi @a943013827,

the short answer is that the configuration of Elasticsearch that you've benchmarked cannot cope with the load.

Citing from Rally's docs:

Latency: Time period between submission of a request and receiving the complete response. It also includes wait time, i.e. the time the request spends waiting until it is ready to be serviced by Elasticsearch.

Service_time: Time period between start of request processing and receiving the complete response. This metric can easily be mixed up with latency but does not include waiting time. This is what most load testing tools refer to as “latency” (although it is incorrect).

You can also see that it is not able to cope with the load on the high variation in throughput (I guess you specified either no target throughput or 200 ops/s).

If you're interested in the theory behind all this, the article Relating Service Utilisation to Latency is a good introduction in my opinion.

Daniel


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.