How design tracks based on our reality usage?

benelastic · November 16, 2023, 3:05am

Rally has provided a bunch of out-of-box default tracks. But we may not use those tracks to benchmark our ES deployment, right? What are the best practices to benchmark ES based on our actual scenarios? Is it necessary to create custom tracks with our data, or just using the default ones will be ok?

After creating a custom track with business data, how to define the operations based on the actual scenario? So that the benchmark report can mostly reflect the actual usage?

Bradley_Deam · November 20, 2023, 5:52am

Hi @benelastic - this is quite an involved question with no simple answer. Let me try and answer in line:

Rally has provided a bunch of out-of-box default tracks. But we may not use those tracks to benchmark our ES deployment, right?

You're of course free to use these apply these workloads against Elasticsearch, but you need to be mindful that the data model, index size, operations performed etc. may not actually represent your usage of Elasticsearch.

For example, if you have an observability based used case with lots of application logs, then the results from running a benchmark like geonames is probably of less value given the focus of the track is around a completely different use case, and you'd likely be better off running http_logs.

What are the best practices to benchmark ES based on our actual scenarios?

We have a public talk on common pitfalls that I would strongly recommend watching: The Seven Deadly Sins of Elasticsearch Benchmarking.

Is it necessary to create custom tracks with our data, or just using the default ones will be ok?

After creating a custom track with business data, how to define the operations based on the actual scenario? So that the benchmark report can mostly reflect the actual usage?

Yes, a custom track is the way forward here if you want to benchmark something that is as close as possible to your own workload. Unfortunately there's no "automagic" way to define a track that is exactly the same as your production usage. Instead, you'll have to decide on and define the operations that make sense for your workload - this is going to require trial and error.

Benchmarking and load testing are are quite complex and vast topics, so I acknowledge that this advice all seems a bit hand wavy and generic, but this is really an advanced use case and designed for those users that want to test/benchmark specific aspects of their cluster.

If you haven't already seen it, we have docs on creating a custom track, and you can use the existing tracks as inspiration for structuring etc:

Define Custom Workloads: Tracks - Rally 2.10.0.dev0 documentation

benelastic · November 20, 2023, 8:56am

Hi @Bradley_Deam
Thanks so much for the reply.
Yes I have read through all the docs that I can find as well as the discussions on Github repository. Thanks for the detailed advise regarding my questions. This is quite helpful for me. So I will aim to create a custom tracks and put some operations base on my use cases into it.

system · December 18, 2023, 8:56am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Benchmark for existing cluster Elasticsearch rally	4	3822	August 15, 2017
Rally custom benchmark-only pipeline / track without corpora Elasticsearch rally	5	1263	June 9, 2020
Rally Benchmarking in Test Cluster-Download Elasticsearch rally	6	594	March 20, 2019
Rally Benchmark - Which race/benchmark to use for performance testing Elasticsearch rally	2	499	December 27, 2021
Questions about custom tracks Elasticsearch rally	4	752	March 7, 2018

How design tracks based on our reality usage?

Related topics