During my task to compare performance between dockerized elasticsearch and running elasticsearch as native service / on shell, I discovered big fluctuation between the tests - when rerunning the same testset!
run esrally with external car, track geonames, challenge default.
Nothing else is running on the server where elasticsearch is tested (except for metricbeat).
esrally runs on a system shared with elastic dev system with very low load. But I also stopped that dev system in previous runs, where the benchmarks had the same fluctuation.
Any help is really appreciated.
Thanks a lot, Andreas
your workshop was interesting. It's clear to me that the latency will go up, if we query faster than the system can respond because auf the raising queue.
What is not completely clear to me is that, why the service time is varying that much. Do you think it is caused by the overload? And if I lower the target throughput the values should become more stable?
The talk (and in fact Rally as well) is making a simplifying assumption, namely that the benchmarked system can be modelled using only one queue (also known as M/M/1 queue in queuing theory) but systems in practice can have several queues, e.g. incoming network packets can queue up on OS level, runnable processes queue up in the CPU scheduler's run queue, Elasticsearch has a queue in front of its thread pool and if multiple Elasticsearch nodes are processing a query even more queues are involved. So service time is only an approximation (although the best one that we can get from a client perspective) and that would explain why you see a varying service time.
As a corollary from my previous reasoning, this could indeed be the case and would make sense to test.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.