Suppose I want to benchmark an existing cluster, whose Elasticsearch version is 2.x. If I use "--pipeline=benchmark-only --target-hosts=...", will Rally detect the Elasticsearch version and checkout the appropriate tracks branch? In order to test it, I forked the official tracks repository and deleted the branch 2, hoping that the run would fail, but it didn't. In pipeline 'from-distribution', on the other hand, the version can be specified.
Yes, it will. It checks $ES_HOST:$ES_PORT/ (e.g. http://localhost:9200/) to detect the distribution version. Rally will then choose the most appropriate version of a track (for details see the rally-tracks README). In your case, it would checkout branch 2.
Am I right that you did do git branch -D 2? If yes, that won't help. Rally will fetch from the remote and simply checkout the branch again. You can also see this in the logs. Here is what happens on my machine when I delete the branch 2, start Elasticsearch 2.4.0 and run esrally --pipeline=benchmark-only:
2017-05-02 06:07:40,52 rally.launcher INFO Distribution version was not specified by user. Rally-determined version is [2.4.0]
2017-05-02 06:07:40,91 rally.racecontrol INFO Mechanic has started engine successfully.
2017-05-02 06:07:40,91 rally.racecontrol INFO Reloading track based for distribution version [2.4.0]
2017-05-02 06:07:40,802 rally.track INFO Checking out [2] in [/Users/daniel/.rally/benchmarks/tracks/default] for distribution version [2.4.0].
2017-05-02 06:07:40,813 rally.track INFO Rebasing on [2] in [/Users/daniel/.rally/benchmarks/tracks/default] for distribution version [2.4.0].
2017-05-02 06:07:40,914 rally.track INFO Reading track specification file [/Users/daniel/.rally/benchmarks/tracks/default/geonames/track.json].
As you can see from the first message, Rally automatically detected the distribution version and then just checked out the branch again. If you do everything identically but start Rally with esrally --pipeline=benchmark-only --offline this will prevent Rally from fetching remote branches:
2017-05-02 06:15:09,629 rally.launcher INFO Distribution version was not specified by user. Rally-determined version is [2.4.0]
2017-05-02 06:15:09,638 rally.racecontrol INFO Mechanic has started engine successfully.
2017-05-02 06:15:09,638 rally.racecontrol INFO Reloading track based for distribution version [2.4.0]
2017-05-02 06:15:09,646 rally.track INFO Checking out [master] in [/Users/daniel/.rally/benchmarks/tracks/default] for distribution version [2.4.0].
2017-05-02 06:15:09,655 rally.track INFO Reading track specification file [/Users/daniel/.rally/benchmarks/tracks/default/geonames/track.json].
As you can see, it checks out the master branch and later on the benchmark will fail.
By the way, you can also specify --distribution-version with the benchmark-only pipeline. This will override Rally's autodetection mechanism but usually you don't need that.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.