Esrally not using current git branch

bherring · June 29, 2017, 5:15pm

I have added a new default track within branch "5" and committed the related files. However, when I run esrally it always seems to use the master branch to find the track definitions, even when my current branch is 5. Also, after running esrally, a side effect is that git has been switched to the master branch.
My understanding is that the default behavior for --revision is "current", meaning "use whatever the current git branch is" for building the benchmark candidate, but it doesn't seem to be behaving that way for me. What am I missing?

Example:

git checkout 5
esrally list tracks
...doesn't show the latest challenges for my track on branch 5...
Now git says I'm on the master branch.

Log snippet:

017-06-29 17:07:25,569 PID:35706 rally.main INFO OS [posix.uname_result(sysname='Linux', nodename='jumphost', release='3.13.0-117-generic', version='#164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017', machine='x86_64')]
2017-06-29 17:07:25,569 PID:35706 rally.main INFO Python [namespace(_multiarch='x86_64-linux-gnu', cache_tag='cpython-34', hexversion=50594800, name='cpython', version=sys.version_info(major=3, minor=4, micro=3, releaselevel='final', serial=0))]
2017-06-29 17:07:25,570 PID:35706 rally.main INFO Rally version [0.6.0]
2017-06-29 17:07:25,570 PID:35706 rally.main INFO Command line arguments: Namespace(advanced_config=False, assume_defaults=False, auto_manage_indices=None, car='defaults', challenge=None, client_options='timeout:60000,request_timeout:60000', cluster_health='green', configuration='tracks', configuration_name=None, data_paths=None, distribution_repository='release', distribution_version='', effective_start_date=None, enable_driver_profiling=False, laps=1, limit=10, logging='file', offline=False, override_src_dir=None, pipeline='', preserve_install='False', quiet=False, report_file='', report_format='markdown', revision='current', subcommand='list', target_hosts=None, telemetry='', test_mode=False, track='geonames', track_repository='default', user_tag='')
2017-06-29 17:07:25,570 PID:35706 rally.net INFO Rally connects directly to the Internet (no proxy support).
2017-06-29 17:07:25,649 PID:35706 rally.main INFO Detected a working Internet connection.
2017-06-29 17:07:25,725 PID:35706 rally.process INFO Skipping myself (PID [35706]).
2017-06-29 17:07:25,900 PID:35706 rally.track INFO Checking out [master] in [/home/m2/data/.rally/benchmarks/tracks/default] for distribution version [None].
2017-06-29 17:07:25,905 PID:35706 rally.track INFO Rebasing on [master] in [/home/m2/data/.rally/benchmarks/tracks/default] for distribution version [None].
2017-06-29 17:07:25,936 PID:35706 rally.track INFO Reading track specification file [/home/m2/data/.rally/benchmarks/tracks/default/logging/track.json].
2017-06-29 17:07:25,949 PID:35706 rally.track INFO Final rendered track for '/home/m2/data/.rally/benchmarks/tracks/default/logging/track.json':
...

Thanks greatly, in advance!

danielmitterdorfer · June 30, 2017, 6:31am

Hi @bherring,

I understand that this behavior is puzzling. The short answer is that you need to a be bit more explicit in your special case and you need to help Rally by specifying --pipeline=from-sources-complete --distribution-version=5.3.0-SNAPSHOT (the exact version does not matter too much, what matters is that the major version is correct).

The longer explanation is as follows:

Rally supports two modes: You can either benchmark a released version of Elasticsearch or build Elasticsearch from sources and benchmark that version. You seem to do the latter but not on the master branch.

Rally detects the Elasticsearch version automatically based on the output of the cluster info API. But to get that output it needs to start Elasticsearch first. Rally provides also a feature to define custom cluster settings in your tracks and this is where the problem lies. It needs to know the version of Elasticsearch to checkout the correct version of the track before it has started the cluster because the cluster settings will be part of elasticsearch.yml for that cluster. The only case where this can actually happen is when you are benchmarking Elasticsearch from sources. In that case Rally assumes that you are a developing on the master branch of Elasticsearch and therefore checks out the corresponding master branch of your track. After it has started the cluster, it will run its detection logic and reload the track with the correct version.

Therefore, you can help Rally by defining the distribution version with e.g. --distribution-version=5.3.0. Then it will checkout the correct branch from the beginning. I think I could implement some git magic to determine that you are benchmarking from sources and determine the branch that you've checked out but I also think that this would introduce a lot of complexity (what if you are on a feature branch, etc. etc.)?

If you define the distribution version, Rally will assume that you want to benchmark a released version of Elasticsearch. As you want to do something else you need to explicitly tell it to benchmark Elasticsearch still from sources by specifying --pipeline=from-sources-complete. Note that if you want to repeat that benchmark without rebuilding Elasticsearch you can specify --pipeline=from-sources-skip-build and save a little bit of time.

Daniel

system · July 28, 2017, 6:31am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.