Use rally to compare existing clusters - document a way to run 'short' tests

I'm a bit new to rally and have been recently trying it out. My goal is to start comparing the performance of two elasticsearch clusters I have access to (classic "old one", new "on kubernetes one"). I was hoping to have a rather "quick" test suite and end up having a performance rating which we can use to measure the performances improvements of tweaking some things on the new "on kubernetes cluster".

The useful bit of the documentation for this was :

https://esrally.readthedocs.io/en/stable/recipes.html?highlight=--pipeline%3Dbenchmark-only#benchmarking-an-existing-cluster

Running on my laptop I tried a simple docker run --rm -ti --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:7.10.1 and esrally race --pipeline=benchmark-only --target-hosts=localhost:9200 , this seemed to be a rather long tests, so I tried to dig for a smaller dataset. After some tweaking I got to using :

esrally race --pipeline=benchmark-only --target-hosts=localhost:9200 --track=percolator --include-tasks=delete-index,create-index,index,percolator_with_content_google --kill-running-processes

Which seems to give me some performance numbers in 69 seconds, :tada:

Am I doing it right ? Could this be put forward in the documentation as "Quickstart, get some raw numbers out without waiting too long" ?

Hello Arthur,

Thank you for your interest in Rally! In general, each rally track evaluating performance of a particular feature. For example, percolator track is for evaluating the performance of percolation queries. So you might want to pick a track that is close to what the actual workload will be. esrally list tracks shows short description of the tracks and existing challenges. For more detailed description please take a look README files under each track in rally-tracks

We do have --exclude option. You could consider running a smaller subset of queries to speed up the test. Also, for each track there often challenges that only test indexing throughput (have index-only in the name) and that could be faster (if indexing performance is what you are trying to tune for). However, we don't use indexing throughput numbers for percolator and noaa as a metrics.

Also, for some general benchmarking advice you might want to check our blog post Seven Tips for Better Elasticsearch Benchmarks .

Evgenia

Hi @Evgenia_Badiyanova, thanks for your reply.

I understand that there are very specific things to test out using rally and that it is tailored for advanced benchmarks, I might be needing that at some point and it feels like a really good tool for that. Indeed my two "naive" interests in performance would be "what's the performance of indexing" and "what's the performance of simple search" (we used https://locust.io/ at some point for the latter).

Do you think rally could have a track/dataset that gives a quick / summarised answer to a first "impression" (not detailed study) of these metrics, a bit like pgbench PostgreSQL: Documentation: 13: pgbench would for a naive approach to perfomance for postgresql.

This is primarily to have simple ways of communicating with a hosting service or IT department to be able to compare existing and new services or when a cluster characteristic is changed (for example, "hey I've added some RAM, is it working faster?" - "me: launched the rally test, we've gone from X to Y, that's good!")