tail -f of the log file rally_out.log shows that the benchmark run gets stuck at some point.
tail -f /rally/esrally/.rally/benchmarks/races/2016-10-12-18-23-47/local/logs/rally_out.log
2016-10-12 18:23:56,598 rally.driver INFO Skipping 1750 lines in [/rally/esrally//.rally/benchmarks/data/tiny/documents.json].
2016-10-12 18:24:01,600 rally.driver INFO client [1] reached join point [JoinPoint(1)].
2016-10-12 18:24:01,601 rally.driver INFO client [3] reached join point [JoinPoint(1)].
2016-10-12 18:24:01,601 rally.driver INFO client [0] reached join point [JoinPoint(1)].
2016-10-12 18:24:01,602 rally.driver INFO client [4] reached join point [JoinPoint(1)].
2016-10-12 18:24:01,602 rally.driver INFO client [6] reached join point [JoinPoint(1)].
2016-10-12 18:24:01,602 rally.driver INFO client [5] reached join point [JoinPoint(1)].
2016-10-12 18:24:01,603 rally.driver INFO client [7] reached join point [JoinPoint(1)].
2016-10-12 18:24:01,603 rally.driver INFO client [2] reached join point [JoinPoint(1)].
2016-10-12 18:24:01,606 rally.driver INFO All drivers completed their operations until join point [1/13].
Also, running 'ps -ef |grep esrally' shows a bunch of esrally child process running.
So, how can we run the benchmark in the background?
If you log out of your SSH session, you'll get an info message that you have "running jobs" which is fine. Rally will continue to run, when you log out.
For our nightly benchmarks, we have a wrapper script around Rally that calls Rally with different tracks / challenges. If you want to do something similar then just replace the Rally invocation with the shell script. I.e.:
I'm getting some weirdness with both "nohup" or "screen".... the esrally run completes but the esrally process runs indefinitely.
example:
run esrally with either nohup or screen, like so (copied from the above example):
I regularly run this command for different benchmarks (incl. the CSV file option) and did not experience such a problem yet. Our benchmarking machines all run on Ubuntu 16.04.
Do you use just SSH to connect to your benchmark machine? Are you using tmux or anything like that? Are you again on Amazon Linux or any other OS?
Can you please also run the following command?
sudo netstat -a -n -p | grep python
I am looking specifically for a line like this one:
Hi. I have a script that runs the built-in benchmark tracks and challenges. It seems to hang up at times. Its long running, and AWS frequently kicks me out. I'm trying to run it in the background, but not having any luck. I tried using your script above,
nohup ./run_rally.sh >> log.rally_stdout 2>> log.rally_stderr < /dev/null &
for t in ${Tracks[@]}
do
sudo mkdir –p /var/log/rally/$t -m ugo+rwx
for ch in ${Challenges[@]}
do
sudo mkdir –p /var/log/rally/$t/$ch -m ugo+rwx
for c in ${Cars[@]}
do
echo Running Rally Track=$t Challenge=$ch Car=$c
esrally --track=$t --challenge=$ch --pipeline=benchmark-only --distribution-version=2.4.2 --target-hosts=[myhost]:9200 --client-options=timeout:10,request_timeout:10 --report-file=/var/log/rally/$t/$ch/$c --report-format=csv
done
done
done
But all I get are my echos in the log file. Nothing in stderr. Since its in the background, Ive lost control of the threads so I end up needing to kill them manually.
The tracks that complete do put log files into /var/log/rally. But I just can't tell where the track hangs up or figure out whats gone wrong when it does.
what you put in /var/log/rally is just the final report. Rally stores its logs in ~/.rally/logs/. Maybe they reveal a little bit more?
By the way, your benchmark approach is problematic because you run multiple benchmarks but do not restart Elasticsearch in between. I'd expect that benchmarks that run at a later point in time will suffer because there are more indices around than in a clean environment and the results are also less reproducible.
We do something similar in our nightly benchmarks but we let Rally manage Elasticsearch (see docs for details).
Thanks @danielmitterdorfer. I'll give your suggestion to delete indices between benchmarks a try. We're using benchmark only as we're comparing our own es clusters in AWS using different instance/disk types.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.