Distributing the load test driver seems to be stuck (?)

sharathr83 · September 18, 2017, 3:14am

I am trying to benchmark my ES cluster using multiple load test drivers (actually 1 load test driver) I have followed the documentation (for 0.7.1) and have done the following :

I have 2 rally runner nodes. I am using 1 as a benchmark coordinator and the other as a load driver. I have started the rally daemon on both of them. The daemon on the benchmark coordinator started with this message :

[INFO] Successfully started actor system on node [node-ip-1] with coordinator node IP [node-ip-1].

On the load driver node, it started with this:
[INFO] Successfully started actor system on node [node-ip-2] with coordinator node IP [node-ip-1].

Both the runner nodes have a copy of the same data. I ran this from the benchmark co-ordinator:

esrally race --track=track-name --target-hosts=cluster-node-1:9200,cluster-node-2:9200,cluster-node-3:9200,cluster-node-4:9200,cluster-node-5:9200 --pipeline=benchmark-only --report-file=result.csv --report-format=csv --load-driver-hosts=node-ip-2

The logs on the benchmark co-ordinator node seems to be stuck at this:

2017-09-18 03:10:22,442 PID:59724 rally.net INFO Rally connects directly to the Internet (no proxy support).
2017-09-18 03:10:22,721 PID:59724 rally.main INFO Detected a working Internet connection.
2017-09-18 03:10:22,807 PID:59724 rally.process INFO Skipping myself (PID [59724]).
2017-09-18 03:10:22,809 PID:59724 rally.main INFO Actor system already running locally? [True]
2017-09-18 03:10:22,810 PID:59724 rally.actor INFO Joining already running actor system with system base [multiprocTCPBase].
2017-09-18 03:10:22,833 PID:59724 rally.racecontrol INFO User specified pipeline [benchmark-only].
2017-09-18 03:10:22,833 PID:59724 rally.racecontrol INFO Using configured hosts [{'host': 'cluster-node-1', 'port': 9200}, {'host': cluster-node-2', 'port': 9200}, {'host': 'cluster-node-3', 'port': 9200}, {'host': 'cluster-node-4', 'port': 9200}, {'host': 'cluster-node-5', 'port': 9200}]
2017-09-18 03:10:22,833 PID:59724 rally.actor INFO Joining already running actor system with system base [multiprocTCPBase].

It seems to be indefinitely stuck there. Is there anything I am missing ?

danielmitterdorfer · September 18, 2017, 11:39am

Hi @sharathr83,

in general it seems fine what you are doing. The reason why you are not seeing more log output here is that at this point the main Rally process has the Rally daemon create a new subprocess on its behalf. All such processes log their output to ~/.rally/logs/rally-actor-messages.log.

I think you're hitting a problem that I have fixed meanwhile and has just been released with Rally 0.7.2. Can you please upgrade and retry? If it is still not working, can you please check the output of ~/.rally/logs/rally-actor-messages.log on the coordinator node?

Daniel

sharathr83 · September 20, 2017, 8:55am

Yes, that worked, thank you !

danielmitterdorfer · September 20, 2017, 9:37am

Glad to hear that. Thanks for the feedback!

system · October 18, 2017, 9:38am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.