Rally throughput counter include data generation

We have some large JSON doc’s we use for testing. The manipulation of the document before ingest takes some time. I can see Rally’s finally output score include throughputs but the time is Rally’s throughput and it’s including buffer array generation time.
Anyone knows how to get the es ingest rate metric as it isn’t include in node-stats

Thanks in advance
Lasse Nedergaard

Hi @Lasse_Nedergaard ,

As Rally is keeping track of the number of documents it is indexing, the throughput from the client side and from the server side (the Elasticsearch ingest rate) will be the same.

The indication from your question seems to be that you believe you have a client-side (or network) bottleneck. We don't typically concern ourselves too much with this (as in real world scenarios, composing bulk requests also takes some amount of time) unless:

  • The data generation code is in rough shape and needs some optimization OR
  • The client (Rally) machine is not powerful enough to generate load at the desired rate

If you are using a persistent data store (which is recommended) you can explore results in rally-metrics-* where the name field is "latency" and the task field is "bulk" (or whatever you have named your bulk task) and look at the meta.took field and compare to the value field, as both are expressed in milliseconds, to see what the latency overhead of your client and network roughly are, in order to assess if you need to optimize your track code, or upgrade your client machine.

Please let us know if this helps
Rick B

Hi Rick

Thanks for cleaning this out it make sense. I will give it a try.
And you are right my rally client do not perform 100% so my problem is likely there.

Thanks for helping out

Lasse Nedergaard

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.