Is there a recommended pipeline to benchmark an existent cluster

xandmaga · May 20, 2020, 12:27pm

We've built an es cluster and indexed our data. How do i benchmark this cluster?

Suppose we already have our tracks, we only want to evaluate search latency and that we know the throughput we'll need.

Do i use the rally daemon with a custom configuration to build another cluster from scratch? And if this is the best option, do i index all, or some part of data, in it?
Or do i use the benchmark-only pipeline to evaluate the existent cluster?

Is there any measure that i can't get with the second option, in regarding with search latency, that is relevant and i should be considering when configuring my cluster?

dliappis · May 20, 2020, 2:06pm

If you want to benchmark an existing cluster you should use the benchmark-only pipeline. Make sure of course that you don't run Rally on the same machine as any of the Elasticsearch nodes (while at it, also make sure you've familiarized yourself with the basic benchmarking gotchas eloquently depicted in the 7 deadly sins of benchmarking presentation).

If there are issues with scalability of the load driver (highly doubtful, since you want to evaluate search latency rather than indexing throughput) you can consider distributing the load driver across several machines.

xandmaga · May 21, 2020, 9:57pm

Thank you, for your response, I've watched the video Benchmarking Elasticsearch with Rally, but didn't know this presentation. We are starting our process of benchmark, in future certainly we'll need to deal with indexing too, thanks for the tips.

Suppose tha we run our track and don't get the desired latency. How will we modify the cluster based on our results? From what i understood reading the docs, with benchamark-only we don't have telemetry devices measures.
In the benchmark-only scenario, will we have to guess which configuration to tweak? Like make a priority list of changes and rerun the benchmark with each one?

For instance:

First increase the heap, and then run the track again
Work on number of shards or size of each one, and then run the track again

Something like this?

Christian_Dahlqvist · May 24, 2020, 8:46pm

If you have a search use case it might be worthwhile looking at the methodology described in this old ElasticON talk. As queries are executed single threaded against each shard, the first step is often to see how search latency depends on shard size. Make sure you vary queries so you do not get all cached. Once you find a good shard size (or range) you set up a cluster and see how many nodes you need in order to handle all your data (generally with one replica). You can now see how much query throughput this small cluster can handle while responding within SLA. Once you have established this you can scale out nodes and replicas to handle larger throughput.

xandmaga · May 26, 2020, 12:59pm

Thank you, i was looking for something like this, in case my existing cluster can't give me the latency i need.

system · June 23, 2020, 12:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Rally does not have control over the configuration of the benchmarked Elasticsearch cluster Elasticsearch rally	4	1078	March 11, 2020
Benchmarck elastic cluster with rally Elasticsearch rally	4	739	August 29, 2021
Benchmarking cluster with rally Elasticsearch rally	3	1251	August 23, 2021
Howto Benchmark Search Performance? Elasticsearch	2	1195	March 16, 2017
Can elastic/rally point to existing ES configurations for benchmarking Elasticsearch rally	10	3766	January 10, 2017

Is there a recommended pipeline to benchmark an existent cluster

Related topics