How to setup and test node on different server?

Hi,

I would like to run a loadtest with esrally. I have 2 servers, one for generating the load, the other should run elasticsearch as single instance.

How do I let esrally create the cluster / node on a different machine?

I tired the following
starting esrallyd on both servers:

# on elastic node server
esrallyd start --node-ip 10.0.2.4 --coordinator-ip 10.0.2.6

# on esrally node
esrallyd start --node-ip 10.0.2.6 --coordinator-ip 10.0.2.6

# starting esrally on estally node
esrally --pipeline=from-distribution --target-hosts=10.0.2.4:9200 --distribution-version=6.2.2

Here is the log:

2018-03-14 19:46:22,289 PID:3940 rally.main INFO OS [posix.uname_result(sysname='Linux', nodename='esrally', release='4.13.0-1011-azure', version='#14-Ubuntu SMP Thu Feb 15 16:15:39 UTC 2018', machine='x86_64')]
2018-03-14 19:46:22,289 PID:3940 rally.main INFO Python [namespace(_multiarch='x86_64-linux-gnu', cache_tag='cpython-35', hexversion=50660080, name='cpython', version=sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0))]
2018-03-14 19:46:22,290 PID:3940 rally.main INFO Rally version [0.9.4]
2018-03-14 19:46:22,290 PID:3940 rally.main INFO Command line arguments: Namespace(advanced_config=False, assume_defaults=False, auto_manage_indices=None, car='defaults', car_params='', challenge=None, client_options='timeout:60', cluster_health='green', configuration_name=None, distribution_repository='release', distribution_version='6.2.2', effective_start_date=None, elasticsearch_plugins='', enable_driver_profiling=False, include_tasks=None, java_home=None, keep_cluster_running=False, laps=1, load_driver_hosts='localhost', logging='file', offline=False, on_error='continue', pipeline='from-distribution', plugin_params='', preserve_install='False', quiet=False, report_file='', report_format='markdown', revision='current', runtime_java_home=None, show_in_report='available', subcommand=None, target_hosts='10.0.2.4:9200', team_path=None, team_repository='default', telemetry='', test_mode=False, track=None, track_params='', track_path=None, track_repository='default', use_gradle_wrapper=False, user_tag='')
2018-03-14 19:46:22,290 PID:3940 rally.net INFO Rally connects directly to the Internet (no proxy support).
2018-03-14 19:46:22,585 PID:3940 rally.main INFO Detected a working Internet connection.
2018-03-14 19:46:22,607 PID:3940 rally.process INFO Skipping myself (PID [3940]).
2018-03-14 19:46:22,607 PID:3940 rally.main INFO Actor system already running locally? [True]
2018-03-14 19:46:22,608 PID:3940 rally.actor INFO Joining already running actor system with system base [multiprocTCPBase].
2018-03-14 19:46:22,633 PID:3940 rally.racecontrol INFO User specified pipeline [from-distribution].
2018-03-14 19:46:22,633 PID:3940 rally.racecontrol INFO Using configured hosts [{'host': '10.0.2.4', 'port': 9200}]
2018-03-14 19:46:22,633 PID:3940 rally.actor INFO Joining already running actor system with system base [multiprocTCPBase].
2018-03-14 19:46:30,512 PID:3940 rally.racecontrol ERROR A benchmark failure has occurred
2018-03-14 19:46:30,513 PID:3940 rally.racecontrol INFO Telling benchmark actor to exit.
2018-03-14 19:46:30,514 PID:3940 root ERROR Cannot run subcommand [race].
Traceback (most recent call last):
  File "/home/butch/.local/lib/python3.5/site-packages/esrally/rally.py", line 560, in dispatch_sub_command
    race(cfg)
  File "/home/butch/.local/lib/python3.5/site-packages/esrally/rally.py", line 490, in race
    with_actor_system(lambda c: racecontrol.run(c), cfg)
  File "/home/butch/.local/lib/python3.5/site-packages/esrally/rally.py", line 510, in with_actor_system
    runnable(cfg)
  File "/home/butch/.local/lib/python3.5/site-packages/esrally/rally.py", line 490, in <lambda>
    with_actor_system(lambda c: racecontrol.run(c), cfg)
  File "/home/butch/.local/lib/python3.5/site-packages/esrally/racecontrol.py", line 377, in run
    raise e
  File "/home/butch/.local/lib/python3.5/site-packages/esrally/racecontrol.py", line 374, in run
    pipeline(cfg)
  File "/home/butch/.local/lib/python3.5/site-packages/esrally/racecontrol.py", line 63, in __call__
    self.target(cfg)
  File "/home/butch/.local/lib/python3.5/site-packages/esrally/racecontrol.py", line 314, in from_distribution
    return race(cfg, distribution=True)
  File "/home/butch/.local/lib/python3.5/site-packages/esrally/racecontrol.py", line 276, in race
    raise exceptions.RallyError(result.message, result.cause)
esrally.exceptions.RallyError: (Node [rally-node-0] has terminated with exit code [78]., 'Traceback (most recent call last):\n  File "/usr/lib/python3.4/site-packages/esrally/mechanic/mechanic.py", line 516, in receiveMsg_StartNodes\n    nodes = self.mechanic.start_engine()\n  File "/usr/lib/python3.4/site-packages/esrally/mechanic/mechanic.py", line 646, in start_engine\n    self.nodes = self.launcher.start(node_configs)\n  File "/usr/lib/python3.4/site-packages/esrally/mechanic/launcher.py", line 210, in start\n    return [self._start_node(node_configuration, node_count_on_host, java_major_version) for node_configuration in node_configurations]\n  File "/usr/lib/python3.4/site-packages/esrally/mechanic/launcher.py", line 210, in <listcomp>\n    return [self._start_node(node_configuration, node_count_on_host, java_major_version) for node_configuration in node_configurations]\n  File "/usr/lib/python3.4/site-packages/esrally/mechanic/launcher.py", line 239, in _start_node\n    node_process = self._start_process(env, node_name, binary_path)\n  File "/usr/lib/python3.4/site-packages/esrally/mechanic/launcher.py", line 293, in _start_process\n    raise exceptions.LaunchError(msg)\nesrally.exceptions.LaunchError: (\'Node [rally-node-0] has terminated with exit code [78].\', None)\n')

what do i wrong?

Hello @asp,

The last message in your error log ( Node [rally-node-0] has terminated with exit code [78] ) refers to the elasticsearch node. In other words, the elasticsearch node is called rally-node-0.

The exit code 78 comes from Elasticsearch and points to a configuration error.

By investigating the elasticsearch logs under ~/.rally/benchmarks/races/<timestamp>/rally-node-0/logs/server/rally-benchmark.log[1] on the target host 10.0.2.4 we can identify the root cause preventing Elasticsearch from starting. It's possible this is due to bootstrap checks failing because vm.max_map_count didn't have a high enough value, but you should inspect your own logs to see the exact error. Details on configuring this can be found in the elasticsearch docs.

$ grep -R ERROR -A2 ~/.rally/benchmarks/races/*/rally-node-0/logs/server/rally-benchmark.log 
/home/dl/.rally/benchmarks/races/2018-03-15-08-57-46/rally-node-0/logs/server/rally-benchmark.log:[2018-03-15T08:58:07,774][ERROR][o.e.b.Bootstrap          ] [rally-node-0] node validation exception
/home/dl/.rally/benchmarks/races/2018-03-15-08-57-46/rally-node-0/logs/server/rally-benchmark.log-[1] bootstrap checks failed
/home/dl/.rally/benchmarks/races/2018-03-15-08-57-46/rally-node-0/logs/server/rally-benchmark.log-[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
--
/home/dl/.rally/benchmarks/races/2018-03-15-09-00-11/rally-node-0/logs/server/rally-benchmark.log:[2018-03-15T09:00:25,578][ERROR][o.e.b.Bootstrap          ] [rally-node-0] node validation exception
/home/dl/.rally/benchmarks/races/2018-03-15-09-00-11/rally-node-0/logs/server/rally-benchmark.log-[1] bootstrap checks failed
/home/dl/.rally/benchmarks/races/2018-03-15-09-00-11/rally-node-0/logs/server/rally-benchmark.log-[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

[1] The elasticsearch errors can be seen in ~/.rally/rally-actor-messages.log in the target host as well.

Regards,
Dimitris

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.