Esrally client option

there is some options to specify the number of clients for certain operation in rally track

"schedule": [
    {
      "operation": "force-merge",
      "clients": 1 // here
    },
    {
      "operation": "match-all-query",
      "clients": 4,
      "warmup-iterations": 1000,
      "iterations": 1000,
      "target-throughput": 100
    }

so when I configure 4 clients in track file, does each client have different ip address? or just one same ip address which is same to the server installed esrally

also is there any source code assigning client's addresses? (not es client) I was looking for esrally github repository but could'nt find

so when I configure 4 clients in track file, does each client have different ip address? or just one same ip address which is same to the server installed esrally

No, it will use the same IP address under the hood.

also is there any source code assigning client's addresses? (not es client) I was looking for esrally github repository but could'nt find

Rally has a very thin abstraction on top of the Elasticsearch Python client, which you can find here.

It doesn't have specific logic about which interface it will bind per connection, it relies on the ES Python client. I believe that the interface used will be whatever the OS decides by default, e.g. on Linux what ip route get <target ip> shows.

So you mean when I configure the track file like this

"schedule": [
    {
      "operation": "match-all-query",
      "clients": 4, // four client
      "warmup-iterations": 1000,
      "iterations": 1000,
      "target-throughput": 100
    }

the clients requesting "match-all-query" operations are created by this source code and those will have all same ip address, right?

If all clients use same IP address then is there a point to test es with rally? I thought elasticsearch master nodes distribute requests depending on client's ip address. I believe the test has meaning only when ES handles multiple clients who have different unique ip so that ES can assign the operations to as many nodes as possible . plz correct me if Im wrong

Oh I see the confusion, apologies. You are asking about the target IP address (i.e. Elasticsearch's aka foreign) whereas I thought you are asking about the local address that Rally binds to.

When you've provided a list of IP addresses via Rally's --target-hosts cli option, then again Rally doesn't do anything special but relies on the es-python client's logic which, by default, is to connect in a round-robin fashion. If you are using Elastic Cloud (if not, give it a try!) there will be a single endpoint that has the smarts to do the loadbalancing and routing.

Im not saying the client-server relation between rally and elasticsearch. I thought the --target-host command option is to specify which es node would be connect to my esrally and this isn't what Im asking.


When I used the word 'client', I mean the application who send searching or indexing operaions to ES cluster. In elk stack, it would be logstash and efk, would be fluentd. If there are three logstash servers on each ec2 instances, then es nodes recieve requests from three different ip addresses.


In rally's case, I thought we do not need to create the client server (like I said, fluentd or logstash) which send request to es. just need to specify 'client' option on track file.

  "schedule": [
    {
      "operation": "force-merge",
      "clients": 1
    },
    {
      "operation": "match-all-query",
      "clients": 8, // here
      "warmup-iterations": 1000,
      "iterations": 1000,
      "target-throughput": 1000 // also here
    }
  ]

if you specify 'target-throughput: 1000' with 8 clients, it means that each client will issue 125 (= 1000 / 8) requests per second. In total, all clients will issue 1000 requests each second.

I quoted this paragraph from rally docs. And finally, here is the question. When I specify 8 clients on search operation, does es cluster recieve search requests from 8 different ip-addresses? Or just from one ip address? (might be esrally's server, idk)

If you are running just one Rally instance, on one server where there is only one network interface that can connect to various Elasticsearch nodes, then the source address will naturally be the same regardless of the amount of clients. I mean, there are no other IP addresses available to use, so how would the established connection have any other source IP?

Now, if you have a custom configuration on your server with several network interfaces (say eth0, eth1 etc.) configured such that e.g. eth0 -> es0, eth1 -> es1 etc. then, yes, different connections will use different IP addresses, depending on the node that the elasticsearch python client decided to connect to.

Finally, if you use Rally in distributed mode, where you are running several Rally instances on different machines, then it's like the previous case, if you specify >1 clients, various connections will have different source IP addresses since each Rally process runs on a different machine with a different network interface with a (different) dedicated IP.

As I said before, the routing decision is done at the operating system level similarly to what ip route get <some es node ip address> would show you.

This is wrong. All nodes route request to other nodes, but routing doesn't depend on the source IP.

See also Query Phase | Elasticsearch: The Definitive Guide [master] | Elastic for a deeper explanation:

When a search request is sent to a node, that node becomes the coordinating node. It is the job of this node to broadcast the search request to all involved shards, and to gather their responses into a globally sorted result set that it can return to the client.

Ideally, if your target environment has dedicated master nodes, you'll exclude those from --target-hosts so that they can focus on their master responsibilities.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.