Rally: How do the number of requests get calculated in a cluster?

Hi,

we want to use clustered rally to do a really large scaling test (n=100 ES nodes, ? load distributors) for different corpora and FS settings.

In order to accomplish that, I have a few questions regarding the actual load generation:

  1. For the default benchmarks (lets say nyc_taxis), do I understand correctly that if in the challenge once the clients parameter is not specified that it will be single threaded?

  2. Furthermore, if I specify the number of "clients" to 4, and I have two load drivers, do I have 4 or 8 threads sending concurrently?

  3. Analagously: If my challenge operation specifies "iterations": 1000, does this mean 1000 iterations in total or 1000 iterations per load driver?

  4. Is the worker parallelization multi-processing (no GIL) or multi-threading (GIL)? If the latter, any nice way to specify two daemons running on the same machine without any container isolation?

Also, if the number of "clients" are the number of global clients, what would happen if "clients" < number of rally daemons available? :smiley:

Hi @lquenti, thanks for your interest in Rally and the detailed questions. Let me try and answer these in line.

  1. For the default benchmarks (lets say nyc_taxis), do I understand correctly that if in the challenge once the clients parameter is not specified that it will be single threaded?

Yes, but I caveat this statement with the fact that each track exposes different parameters, and you can adjust the concurrency of each task via the clients option. Some tasks do not allow you to adjust the concurrency with track params because there's no template variable to allow so.

  1. Furthermore, if I specify the number of "clients" to 4, and I have two load drivers, do I have 4 or 8 threads sending concurrently?

From Rally's perspective it allocates clients across all available Workers, and another 'load driver' machine is just more Workers, so you'd only have 4.

  1. Analagously: If my challenge operation specifies "iterations": 1000, does this mean 1000 iterations in total or 1000 iterations per load driver?

This is the task as a whole, it means 1000 in total.

  1. Is the worker parallelization multi-processing (no GIL) or multi-threading (GIL)? If the latter, any nice way to specify two daemons running on the same machine without any container isolation?

We use multi-processing. By default the Rally daemon will start a Worker (i.e. a seperate Python process) per-available core on the machine (see available.cores setting to override this), and then starts a single async event loop per Worker.

Also, if the number of "clients" are the number of global clients, what would happen if "clients" < number of rally daemons available? :smiley:

You'd just have Workers without any allocated tasks.

FWIW, pertaining to multi-machine load driver setups, it's uncommon in our experience to actually require this setup unless you're aiming to test very large scale benchmarks with hundreds of thousands, if not millions of requests per-second. Depending on the track and workload, it's entirely feasible for a single 8 core machine to simulate thousands of clients without being the bottleneck.

The 'heavier' the track, the more CPU time you'll need on the load driver. Probably our most resource intensive track is elastic/logs, which dynamically generates documents during indexing, and even with this overhead we've simulated ~1m docs/s on a single load driver (1GB/s network traffic) with 32 cores.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.