Possible reason for > 30% difference between iterations of geonames -> painless_static

During my task to compare performance between dockerized elasticsearch and running elasticsearch as native service / on shell, I discovered big fluctuation between the tests - when rerunning the same testset!

First I compared the overall time of the run:

so we van see about 4 minutes difference in overall time.
I digged deeper to see which operation is causing this:

There are a few operations which are slower, but painless_static is the negative winner.

Above we see throughput, latency and service time. The second run is much slower.

But I am not able to find anything suspcios in metricbeat dashboards:

see 12:24 - 12:28 and compare with 13:02 - 13:09

Here is my testing process:

  • delete old geonames-index via kibana
  • stop elasticsearch
  • stop kibana
  • start elasticsearch
  • start kibana
  • wait until elasticsearch is up
  • run esrally with external car, track geonames, challenge default.

Nothing else is running on the server where elasticsearch is tested (except for metricbeat).
esrally runs on a system shared with elastic dev system with very low load. But I also stopped that dev system in previous runs, where the benchmarks had the same fluctuation.

Any help is really appreciated.
Thanks a lot, Andreas

Hi Andreas,

to understand what's happening I suggest to watch my talk The Seven Deadly Sins of Elasticsearch Benchmarking (free to watch but requires prior registration). Please check item "sin 3" which covers your question extensively. See also the related blog post Seven Tips for Better Elasticsearch Benchmarks which is a summary of the talk.


Hey Daniel, Thanks for the reply.

your workshop was interesting. It's clear to me that the latency will go up, if we query faster than the system can respond because auf the raising queue.

What is not completely clear to me is that, why the service time is varying that much. Do you think it is caused by the overload? And if I lower the target throughput the values should become more stable?

Regards, Andreas

The talk (and in fact Rally as well) is making a simplifying assumption, namely that the benchmarked system can be modelled using only one queue (also known as M/M/1 queue in queuing theory) but systems in practice can have several queues, e.g. incoming network packets can queue up on OS level, runnable processes queue up in the CPU scheduler's run queue, Elasticsearch has a queue in front of its thread pool and if multiple Elasticsearch nodes are processing a query even more queues are involved. So service time is only an approximation (although the best one that we can get from a client perspective) and that would explain why you see a varying service time.

As a corollary from my previous reasoning, this could indeed be the case and would make sense to test.


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.