Hi @benelastic - this is quite an involved question with no simple answer. Let me try and answer in line:
Rally has provided a bunch of out-of-box default tracks. But we may not use those tracks to benchmark our ES deployment, right?
You're of course free to use these apply these workloads against Elasticsearch, but you need to be mindful that the data model, index size, operations performed etc. may not actually represent your usage of Elasticsearch.
For example, if you have an observability based used case with lots of application logs, then the results from running a benchmark like
geonames is probably of less value given the focus of the track is around a completely different use case, and you'd likely be better off running
What are the best practices to benchmark ES based on our actual scenarios?
We have a public talk on common pitfalls that I would strongly recommend watching: The Seven Deadly Sins of Elasticsearch Benchmarking.
Is it necessary to create custom tracks with our data, or just using the default ones will be ok?
After creating a custom track with business data, how to define the operations based on the actual scenario? So that the benchmark report can mostly reflect the actual usage?
Yes, a custom track is the way forward here if you want to benchmark something that is as close as possible to your own workload. Unfortunately there's no "automagic" way to define a track that is exactly the same as your production usage. Instead, you'll have to decide on and define the operations that make sense for your workload - this is going to require trial and error.
Benchmarking and load testing are are quite complex and vast topics, so I acknowledge that this advice all seems a bit hand wavy and generic, but this is really an advanced use case and designed for those users that want to test/benchmark specific aspects of their cluster.
If you haven't already seen it, we have docs on creating a custom track, and you can use the existing tracks as inspiration for structuring etc: