We have some large JSON doc’s we use for testing. The manipulation of the document before ingest takes some time. I can see Rally’s finally output score include throughputs but the time is Rally’s throughput and it’s including buffer array generation time.
Anyone knows how to get the es ingest rate metric as it isn’t include in node-stats
As Rally is keeping track of the number of documents it is indexing, the throughput from the client side and from the server side (the Elasticsearch ingest rate) will be the same.
The indication from your question seems to be that you believe you have a client-side (or network) bottleneck. We don't typically concern ourselves too much with this (as in real world scenarios, composing bulk requests also takes some amount of time) unless:
The data generation code is in rough shape and needs some optimization OR
The client (Rally) machine is not powerful enough to generate load at the desired rate
If you are using a persistent data store (which is recommended) you can explore results in rally-metrics-* where the name field is "latency" and the task field is "bulk" (or whatever you have named your bulk task) and look at the meta.took field and compare to the value field, as both are expressed in milliseconds, to see what the latency overhead of your client and network roughly are, in order to assess if you need to optimize your track code, or upgrade your client machine.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.