How to run esrally in background?


#1

Hi @danielmitterdorfer ,

I'm trying to run the benchmark in the background, so i don't have to worry about my SSH connection to the server dropping out.

I've tried simply adding "&" to the end of the esrally line, like this

$ esrally --pipeline=from-distribution --distribution-version=5.0.0-alpha5 --track tiny &

This does not work. I get following issues.

  • tail -f of the log file rally_out.log shows that the benchmark run gets stuck at some point.

tail -f /rally/esrally/.rally/benchmarks/races/2016-10-12-18-23-47/local/logs/rally_out.log
2016-10-12 18:23:56,598 rally.driver INFO Skipping 1750 lines in [/rally/esrally//.rally/benchmarks/data/tiny/documents.json].
2016-10-12 18:24:01,600 rally.driver INFO client [1] reached join point [JoinPoint(1)].
2016-10-12 18:24:01,601 rally.driver INFO client [3] reached join point [JoinPoint(1)].
2016-10-12 18:24:01,601 rally.driver INFO client [0] reached join point [JoinPoint(1)].
2016-10-12 18:24:01,602 rally.driver INFO client [4] reached join point [JoinPoint(1)].
2016-10-12 18:24:01,602 rally.driver INFO client [6] reached join point [JoinPoint(1)].
2016-10-12 18:24:01,602 rally.driver INFO client [5] reached join point [JoinPoint(1)].
2016-10-12 18:24:01,603 rally.driver INFO client [7] reached join point [JoinPoint(1)].
2016-10-12 18:24:01,603 rally.driver INFO client [2] reached join point [JoinPoint(1)].
2016-10-12 18:24:01,606 rally.driver INFO All drivers completed their operations until join point [1/13].

  • Also, running 'ps -ef |grep esrally' shows a bunch of esrally child process running.

So, how can we run the benchmark in the background?

Thanks in advance,
Uncas


(Daniel Mitterdorfer) #2

Hi @uncas,

This should to the trick:

nohup esrally --pipeline=from-distribution --distribution-version=5.0.0-alpha5 --track tiny >> rally_stdout.log 2>> rally_stderr.log < /dev/null &

If you log out of your SSH session, you'll get an info message that you have "running jobs" which is fine. Rally will continue to run, when you log out.

For our nightly benchmarks, we have a wrapper script around Rally that calls Rally with different tracks / challenges. If you want to do something similar then just replace the Rally invocation with the shell script. I.e.:

nohup my-rally-wrapper-script.sh >> rally_stdout.log 2>> rally_stderr.log < /dev/null &

I hope that helps.

Daniel


#3

@danielmitterdorfer,
That is excellent. Thank you!


(Alexander Gray II) #4

I'm getting some weirdness with both "nohup" or "screen".... the esrally run completes but the esrally process runs indefinitely.
example:
run esrally with either nohup or screen, like so (copied from the above example):

nohup my-rally-wrapper-script.sh >> rally_stdout.log 2>> rally_stderr.log < /dev/null &

After a while you'll see a bunch of esrally processes and then you will be left one single process that never ends:

/usr/bin/python3.4 /usr/local/bin/esrally --pipeline=from-distribution --distribution-version=5.0.0 --track tiny --report-format csv --report-file /tmp/rally_report.csv

Everything is done, but that first process is still running.

What i'm doing to get around this is running with nohup and watching the rally_stdout.log and if I see:

Logs for this race are archived

I know the rally run is done and then I can collect the csv file.

Is it perhaps something when using the csv file?


(Daniel Mitterdorfer) #5

Hi @Alexander_Gray_II,

I regularly run this command for different benchmarks (incl. the CSV file option) and did not experience such a problem yet. Our benchmarking machines all run on Ubuntu 16.04.

Do you use just SSH to connect to your benchmark machine? Are you using tmux or anything like that? Are you again on Amazon Linux or any other OS?

Can you please also run the following command?

sudo netstat -a -n -p | grep python

I am looking specifically for a line like this one:

tcp        0      0 0.0.0.0:1900            0.0.0.0:*               LISTEN      29560/python3

This would indicate that Rally's main process is still running (which opens port 1900 for communicating with other processes).

Daniel


(Matt O'hara) #6

Hi. I have a script that runs the built-in benchmark tracks and challenges. It seems to hang up at times. Its long running, and AWS frequently kicks me out. I'm trying to run it in the background, but not having any luck. I tried using your script above,
nohup ./run_rally.sh >> log.rally_stdout 2>> log.rally_stderr < /dev/null &

Where my run_rally.sh looks like:
#!/bin/bash
Tracks=( geonames geopoint logging nyc_taxis percolator pmc )
Challenges=( append-no-conflicts append-no-conflicts-index-only append-fast-no-conflicts append-fast-with-conflicts )
Cars=( default )

for t in ${Tracks[@]}
do
sudo mkdir –p /var/log/rally/$t -m ugo+rwx
for ch in ${Challenges[@]}
do
sudo mkdir –p /var/log/rally/$t/$ch -m ugo+rwx
for c in ${Cars[@]}
do
echo Running Rally Track=$t Challenge=$ch Car=$c
esrally --track=$t --challenge=$ch --pipeline=benchmark-only --distribution-version=2.4.2 --target-hosts=[myhost]:9200 --client-options=timeout:10,request_timeout:10 --report-file=/var/log/rally/$t/$ch/$c --report-format=csv
done
done
done

But all I get are my echos in the log file. Nothing in stderr. Since its in the background, Ive lost control of the threads so I end up needing to kill them manually.

The tracks that complete do put log files into /var/log/rally. But I just can't tell where the track hangs up or figure out whats gone wrong when it does.


(Daniel Mitterdorfer) #7

Hi @mhohara,

what you put in /var/log/rally is just the final report. Rally stores its logs in ~/.rally/logs/. Maybe they reveal a little bit more?

By the way, your benchmark approach is problematic because you run multiple benchmarks but do not restart Elasticsearch in between. I'd expect that benchmarks that run at a later point in time will suffer because there are more indices around than in a clean environment and the results are also less reproducible.

We do something similar in our nightly benchmarks but we let Rally manage Elasticsearch (see docs for details).

Daniel


(Matt O'hara) #8

Thanks @danielmitterdorfer. I'll give your suggestion to delete indices between benchmarks a try. We're using benchmark only as we're comparing our own es clusters in AWS using different instance/disk types.


(system) #9