Distributing load test driver, docker container limitation

Preamble that I'm relatively new to Elasticsearch, load testing, and by no means a cloud infrastructure expert...

I've had success with esrally and creating a new track to test a custom search query load, both against a local node that rally creates and our "dev" cluster.

My next step is to try out distributed load drivers, with the ultimate goal of simulating current average load, then 10x, 100x, etc. and see where things fall apart.

I see that there is a documented limitation in using the docker image Running Rally with Docker - Rally 2.8.0.dev0 documentation and distributing the load test driver. It seems to me like running it in docker is such an obviously convenient way for temporarily standing up a distributed load test driver. I have some questions to make sure I'm not missing something...

  1. Is there any way to work around this limitation? In other words, could I package my own docker image with the needed dependencies whose entrypoint was the appropriate esrallyd commands? Why wouldn't that work?

  2. The examples and other posts I find about distributing the load driver are typically setting up 2-3 instances. Since my goal is to simulate production traffic, I think many more instances would be appropriate. The rate of requests per client would obviously be much less, but I'm sure there are many ES clusters that have thousands of clients making requests per second. A distributed load driver network of 100 clients seems feasible. What is an appropriate way to size the distributed load driver network?

  3. If there are indeed rally tests using tens or hundreds of distributed load drivers, what is recommended to set it up? A container just seems like a natural part of automating that.

  4. If we have an ingress/load balancer already configured via cloud infrastructure, such that our clients access via a single address, is there any point in running with the individual hosts separated out in the --target-hosts list?

Sorry for the long post and thanks for any help!

Hi @Rich2023, thank you for your interest in Rally, as well as the thoughtful and detailed post!

  1. Is there any way to work around this limitation? In other words, could I package my own docker image with the needed dependencies whose entrypoint was the appropriate esrallyd commands? Why wouldn't that work?
  2. The examples and other posts I find about distributing the load driver are typically setting up 2-3 instances. Since my goal is to simulate production traffic, I think many more instances would be appropriate. The rate of requests per client would obviously be much less, but I'm sure there are many ES clusters that have thousands of clients making requests per second. A distributed load driver network of 100 clients seems feasible. What is an appropriate way to size the distributed load driver network?

The distributed load driver functionality is not something that's gotten a lot of love recently, particularly because of some significant refactoring we did a couple of years ago that allows a single Rally instance to simulate thousands, to tens of thousands of clients, the exact num depends on the workload and hardware you're using to run Rally.

When we do decide to run Rally in a distributed mode internally at Elastic, we often elect to run Rally on dedicated machines to ensure that we don't encounter any type of 'noisy neighbour' situations that might affect the results, something that can be particularly challenging when running workloads on container orchestrators like K8s.

That's not to say you couldn't create a customised docker image that does setup multiple instances of Rally in the distributed configuration, it's just not something we focused on with the official image that we publish. Pull Requests welcome :smiley:.

  1. If there are indeed rally tests using tens or hundreds of distributed load drivers, what is recommended to set it up? A container just seems like a natural part of automating that.

Not specific to distributing load drivers, but we do have a great talk about the many pitfalls of benchmarking Elasticsearch that's worth watching.

  1. If we have an ingress/load balancer already configured via cloud infrastructure, such that our clients access via a single address, is there any point in running with the individual hosts separated out in the --target-hosts list?

No need to, if you have a LB/single entry point to your K8s cluster, then you can just specify that as the target host. The intention of --target-hosts is to allow Rally to load balance/round-robin requests amongst the specified nodes (such as a coordinator tier), but that paradigm doesn't really work with container orchestrators like K8s.

Thanks @Bradley_Deam, that's a very helpful response. I'm happy about the answer actually, because I have already played with changing the number of clients in my track definition, and it seems much simpler than setting up the distributed load. One more question - if production level load tests are being done with a single instance of Rally simulating thousands of clients, what kind of hardware (or cloud resources?) can effectively drive tens of thousands of clients? Maybe this is something that video will answer, or it varies too much to give an answer here, but I would guess that I couldn't run this kind of test from my dev laptop from my home network. Either the machine or my internet connection must be a bottleneck compared to simulating a production traffic load, right?

One more question - if production level load tests are being done with a single instance of Rally simulating thousands of clients, what kind of hardware (or cloud resources?) can effectively drive tens of thousands of clients?

It's really impossible to give a definitive answer here, but I did some digging internally to see what we've had success with and saw that we were able to run with ~20,000 simulated clients on a single m5d.4xlarge AWS instance (16 vCPUs, 64GB RAM, 2 x 300 NVMe SSD).

Unfortunately I don't have any metrics to analyse whether or not we actually required all of that memory or not, I say that because we tend to over-provision the load driver instances to ensure that they're not the bottleneck. I'd suggest that you allocate a good number of CPU cores/quota to the load generator for high client counts.

Maybe this is something that video will answer, or it varies too much to give an answer here, but I would guess that I couldn't run this kind of test from my dev laptop from my home network.

Heh, even if you could run the benchmark from your laptop, it's probably not the most scientific approach.

I think in this case you'll need a dedicated machine/pod, as just creating 20,000 TCP connections alone will usually require some systctl tuning on the load driver and target node(s).

This Red Hat article has some good suggestions: kernel: Possible SYN flooding on port #. Sending cookies. - Red Hat Customer Portal, but we set the following:

# /etc/sysctl.conf
net.ipv4.tcp_syncookies = 0
net.core.somaxconn = 4096 # at least 2600 is needed for 20000 connections, Ubuntu default is 128
net.ipv4.tcp_max_syn_backlog = 20480
net.core.netdev_max_backlog = 10000
fs.file-max=500000
vm.max_map_count=262144

Either the machine or my internet connection must be a bottleneck compared to simulating a production traffic load, right?

Correct. See the video I linked earlier that explores why running benchmarks from a laptop (especially one with thousands of clients!) is a bad idea in general :slight_smile:

2 Likes

Great, thanks again for your response!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.