Hi,
I am a newcomer to Rally and having a go at running some benchmarks. My OS is Ubuntu Linux 18.04.1 64-bit. Rally version is 1.4.1. I have created a custom track and set of cars, and have a benchmark that runs nicely by default (i.e. if I have a vanilla defaults
car that does nothing special to start the REST server).
However, when I change the configuration to use a different car mixin, REST requests to the server don't get through because of my mixin having added some special java agent properties to the server's JVM startup, which I can't describe as they are specific to my enterprise. Enough to say that the server takes too long to start up, and I don't seem able to increase the server startup timeout so that it waits until the server is ready and all that agent stuff has done its work.
My rally.ini
file is here:
Summary
[meta]
config.version = 17
[system]
env.name = local
[node]
root.dir = /home/spayne/.rally/benchmarks
src.root.dir = /home/spayne/.rally/benchmarks/src
[source]
remote.repo.url = https://github.com/elastic/elasticsearch.git
elasticsearch.src.subdir = elasticsearch
[benchmarks]
local.dataset.cache = /home/spayne/.rally/benchmarks/data
[reporting]
datastore.type = in-memory
datastore.host =
datastore.port =
datastore.secure = False
datastore.user =
datastore.password =
[tracks]
default.url = https://github.com/elastic/rally-tracks
[teams]
default.url = https://github.com/elastic/rally-teams
[defaults]
preserve_benchmark_candidate = False
[distributions]
release.cache = true
An example race's elasticsearch.yml
file is here:
Summary
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please see the documentation for further information on configuration options:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html>
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: rally-benchmark
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: rally-node-0
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: ['/home/spayne/.rally/benchmarks/races/8fb02ac0-da4d-492b-8f63-ed4bf9422751/rally-node-0/install/elasticsearch-6.6.2/data']
#
# Path to log files:
#
path.logs: /home/spayne/.rally/benchmarks/races/8fb02ac0-da4d-492b-8f63-ed4bf9422751/rally-node-0/logs/server
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 127.0.0.1
#
# Set a custom port for HTTP:
#
http.port: 39200-39300
transport.tcp.port: 39300-39400
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html>
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.zen.ping.unicast.hosts: ["127.0.0.1"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of nodes / 2 + 1):
#
discovery.zen.minimum_master_nodes: 1
discovery.zen.fd.ping_internal: 30s
discovery.zen.fd.ping_timeout: 120s
discovery.zen.fd.ping_retries: 100
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery.html>
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-gateway.html>
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
And the rally.log
file is here (snippet from the end of the run, there are hundreds of refused connection messages:
Summary
2020-03-02 15:06:07,548 -not-actor-/PID:30588 elasticsearch WARNING GET http://127.0.0.1:39200/_cluster/health?wait_for_nodes=%3E%3D1 [status:N/A request:0.000s]
Traceback (most recent call last):
File "/home/spayne/.local/lib/python3.6/site-packages/urllib3/connection.py", line 157, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/home/spayne/.local/lib/python3.6/site-packages/urllib3/util/connection.py", line 84, in create_connection
raise err
File "/home/spayne/.local/lib/python3.6/site-packages/urllib3/util/connection.py", line 74, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
[...LOTS OF LINES REDACTED TO SAVE SPACE..]
raise e
File "/home/spayne/.local/lib/python3.6/site-packages/esrally/racecontrol.py", line 362, in run
pipeline(cfg)
File "/home/spayne/.local/lib/python3.6/site-packages/esrally/racecontrol.py", line 79, in __call__
self.target(cfg)
File "/home/spayne/.local/lib/python3.6/site-packages/esrally/racecontrol.py", line 290, in from_distribution
return race(cfg, distribution=True)
File "/home/spayne/.local/lib/python3.6/site-packages/esrally/racecontrol.py", line 250, in race
raise exceptions.RallyError(result.message, result.cause)
esrally.exceptions.RallyError: ("Error in driver (('Elasticsearch REST API layer is not available.', None))", None)
A sample command-line request to run a race is:
esrally --car=defaults,my-agent-mixin --pipeline=from-distribution --distribution-version=6.6.2 --track-path=/home/spayne/elasticsearch-src/custom-tracks --team-repository=/home/spayne/elasticsearch-src/teams --report-file=/home/spayne/elasticsearch-src/reports/my-agent-debug.md --report-format=markdown --client-options="timeout:1200" --preserve-install=true
The my-agent-mixin
adds an extra few JVM system properties to jvm.options
that slow down the startup by running a particular javaagent.
Console output is:
____ ____
/ __ \____ _/ / /_ __
/ /_/ / __ `/ / / / / /
/ _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
/____/
[INFO] Preparing for race ...
[INFO] Preserving benchmark candidate installation at [/home/spayne/.rally/benchmarks/races/8fb02ac0-da4d-492b-8f63-ed4bf9422751/rally-node-0/install/elasticsearch-6.6.2].
[ERROR] Cannot race. Error in driver (('Elasticsearch REST API layer is not available.', None))
Getting further help:
*********************
* Check the log files in /home/spayne/.rally/logs for errors.
* Read the documentation at https://esrally.readthedocs.io/en/1.4.1/
* Ask a question on the forum at https://discuss.elastic.co/c/elasticsearch/rally
* Raise an issue at https://github.com/elastic/rally/issues and include the log files in /home/spayne/.rally/logs.
---------------------------------
[INFO] FAILURE (took 152 seconds)
---------------------------------
and whatever I do I don't seem able to make it take longer than about 150s. (e.g. I tried twiddling the zen discovery fault detection timeout and poll interval, which made no difference; and I added a long timeout to the --client-options
, likewise no difference.
I guess the connection errors in rally.log
are because the server hasn't yet finished starting up. The core issue is that I don't seem able to make the process continue for the time needed to fnish the slow startup (about 3 to 5 minutes typically).
Any suggestions?
Thanks,
Simon