We set up a nightly race on an existing cluster to see the evolution of the cluster's performance.
We are deleting existing indices from the previous race, indexing them again - 3, for a total of about 10G - running queries and aggregations and a force merge at the end. Everything works fine and all metrics make sense, except one: the indexing time.
Indeed, it's been gradually increasing day after day from the beginning:
One option here is to record a flamegraph during indexing using async-profiler. It will tell us where indexing is spending its time. Given the 4x increase hopefully we will see clearly what the issue is.
It does, which is why I mentioned that all the other metrics are steady. I am aware that this is not optimal and would understand the possibility of things going in multiple directions in general, but I have a hard time grasping this.
I am not familiar with this at all, but I will try to get this done and report back here.
Indexing time as reported by Rally is a cumulative statistic the cluster maintains. If you were to invoke a recent version of Rally manually against your cluster you'd see a warning in the console about ensuring your cluster is in a known good state for benchmark results. If your nodes are not restarted between runs we would expect this statistic to grow over time.
Sorry, I was mistaken, this is from the index stats API, not the node stats. This is cumulative but is inclusive of all the indices on the cluster, so while you co-locate your benchmark indices on a cluster with other indices, this stat will increase.
To make sure I understand correctly, what field are we talking about in the index stats API? The index_time_in_millis ?
If so, I tried to find information in the documentation but couldn't find anything specific about it. Would you be able to point in the right direction?
Do you mean that it would be the total indexing time (at the thread level) across all indices during the measure?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.