The principle of sorting queries

I simulated some discrete queries using esrally,and found something that confused me.
the first picture is not use sort, and the second use sort. search qps is same.
【99.9th percentile service time】not use sort is smaller than use sort, but
【100th percentile service time】not use sort is bigger than use sort,why?

Thank you for your interest in Rally!

The 100th percentile is the slowest operation of the whole run. Many things can affect it, such as the CPU utilization of the Elasticsearch cluster and the load driver, the Java garbage collector, the Python garbage collector, the disk accesses, etc. I'd suggest running those benchmarks multiple times, it will give you an idea of the reliability of those numbers.

I would also suggest that you look at the difference between latency and service_time: they are different in your case, which proves there is a queuing effect, which will make higher percentiles even less stable.

Finally, why are you saying that search qps is the same? I can see a median throughput of 26 ops/s in the upper screenshot and 107 ops/s in the lower one.

thank for you reply. sorry, I use the same target-throughput and client, careless think QPS is the same.
what you said is right, I want to pressure measurement, so throughout is setting large, CPU used more than 90%, there will be some accumulation, then influence latency metric. so i need to adjust target-throughout smaller?

It depends on what you're trying to achieve, but yes I would usually recommend to set the target-throughput to less than the op/s you're seeing. Using the lower screenshot as an example, while the median service time is 44ms, the median latency is 143s, or 300x more! At this point the system isn't really usable.

1 Like

thanks, ask an unrelated question, In the term query, whether the keyword type is less expensive than the Integer type?I found that the integer query use BKD tree and becomes a range query

I don't know, sorry. I would recommend opening a new topic.

Excuse me, The same configuration and query conditions for the case, the first query size param i use 10, and the second i use 100, target-through i set is 100, the first return throughout is about 100; the second throughout is only 40, and latency is so big, i run GET _tasks between second, there are very few query tasks, CPU and memory usage is normal(less than 40%), why such a big difference?
elasticsearch version: 7.10.2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.