[ERROR] Cannot race. Error in track preparator (('Could not load component [eventdata]', None))

After running this Rally race (and I've been able to run it successfully for several hours beforehand), I suddenly get an error.

[ec2-user@ip-172-30-2-208 ~]$ esrally --distribution-version=7.6.0 --track=eventdata --track-repository=eventdata --challenge=index-logs-fixed-daily-volume --track-params=./params-file.json

This is the error:
[ERROR] Cannot race. Error in track preparator (('Could not load component [eventdata]', None))

Hello,

Few questions:

  1. What's your Rally version? (esrally --version)

  2. Can you please paste the contents of your ~/.rally/logs/rally.log (if it's large, try to paste the pertinent parts to the error, looking at the file from bottom to top).

The eventdata track is as far as I know not installed by default. Have you installed and configured it? Have you run this particular track successfully before?

2020-04-09 02:26:27,435 ActorAddr-(T|:41821)/PID:29187 esrally.mechanic.mechanic WARNING Cannot store system metrics: ('No race with race id [f8fc3c0f-9daa-4e79-ab5b-d08319100387]', None).
2020-04-09 02:26:27,435 ActorAddr-(T|:41821)/PID:29187 esrally.metrics INFO Closing metrics store.
2020-04-09 02:26:27,435 ActorAddr-(T|:41821)/PID:29187 esrally.mechanic.provisioner INFO Wiping benchmark candidate installation at [/home/ec2-user/.rally/benchmarks/races/f8fc3c0f-9daa-4e79-ab5b-d08319100387/rally-node-0/install/elasticsearch-7.6.0].
2020-04-09 02:26:27,511 ActorAddr-(T|:46571)/PID:29150 esrally.actor INFO BenchmarkActor received unknown message [ChildActorExited:ActorAddr-(T|:44945)] (ignoring).
2020-04-09 02:26:30,205 -not-actor-/PID:29143 esrally.rally INFO Attempting to shutdown internal actor system.
2020-04-09 02:26:30,206 -not-actor-/PID:29149 root INFO ActorSystem Logging Shutdown
2020-04-09 02:26:30,227 -not-actor-/PID:29148 root INFO ---- Actor System shutdown
2020-04-09 02:26:30,227 -not-actor-/PID:29143 esrally.rally INFO Actor system is still running. Waiting...
2020-04-09 02:26:31,228 -not-actor-/PID:29143 esrally.rally INFO Shutdown completed.
2020-04-09 02:26:31,228 -not-actor-/PID:29143 esrally.rally ERROR Cannot run subcommand [race].
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/esrally/rally.py", line 714, in dispatch_sub_command
race(cfg)
File "/usr/local/lib/python3.7/site-packages/esrally/rally.py", line 624, in race
with_actor_system(racecontrol.run, cfg)
File "/usr/local/lib/python3.7/site-packages/esrally/rally.py", line 652, in with_actor_system
runnable(cfg)
File "/usr/local/lib/python3.7/site-packages/esrally/racecontrol.py", line 365, in run
raise e
File "/usr/local/lib/python3.7/site-packages/esrally/racecontrol.py", line 362, in run
pipeline(cfg)
File "/usr/local/lib/python3.7/site-packages/esrally/racecontrol.py", line 79, in call
self.target(cfg)
File "/usr/local/lib/python3.7/site-packages/esrally/racecontrol.py", line 290, in from_distribution
return race(cfg, distribution=True)
File "/usr/local/lib/python3.7/site-packages/esrally/racecontrol.py", line 250, in race
raise exceptions.RallyError(result.message, result.cause)
esrally.exceptions.RallyError: ("Error in track preparator (('Could not load component [eventdata]', None))", None)
2020-04-09 02:27:54,488 -not-actor-/PID:29705 esrally.rally INFO Race id [93914fe4-2bbc-400e-a714-eb8c0ea6f41b]
2020-04-09 02:27:54,488 -not-actor-/PID:29705 esrally.rally INFO OS [uname_result(system='Linux', node='ip-172-30-2-208.us-west-2.compute.internal', release='4.14.173-137.228.amzn2.x86_64', version='#1 SMP Thu Mar 19 16:50:21 UTC 2020', machine='x86_64', processor='x86_64')]
2020-04-09 02:27:54,488 -not-actor-/PID:29705 esrally.rally INFO Python [namespace(_multiarch='x86_64-linux-gnu', cache_tag='cpython-37', hexversion=50792176, name='cpython', version=sys.version_info(major=3, minor=7, micro=6, releaselevel='final', serial=0))]
2020-04-09 02:27:54,488 -not-actor-/PID:29705 esrally.rally INFO Rally version [1.4.1]
2020-04-09 02:27:54,488 -not-actor-/PID:29705 esrally.utils.net INFO Connecting directly to the Internet (no proxy support).
2020-04-09 02:27:54,567 -not-actor-/PID:29705 esrally.rally INFO Detected a working Internet connection.
2020-04-09 02:27:54,592 -not-actor-/PID:29705 esrally.rally INFO Actor system already running locally? [False]
2020-04-09 02:27:54,592 -not-actor-/PID:29705 esrally.actor INFO Starting actor system with system base [multiprocTCPBase] and capabilities [{'coordinator': True, 'ip': '127.0.0.1', 'Convention Address.IPv4': '127.0.0.1:1900'}].
2020-04-09 02:27:54,608 -not-actor-/PID:29711 root INFO ++++ Actor System gen (3, 9) started, admin @ ActorAddr-(T|:1900)
2020-04-09 02:27:54,616 -not-actor-/PID:29705 esrally.racecontrol INFO User specified no pipeline. Automatically derived pipeline [from-distribution].
2020-04-09 02:27:54,617 -not-actor-/PID:29705 esrally.racecontrol INFO Setting default host to [127.0.0.1:39200]
2020-04-09 02:27:54,617 -not-actor-/PID:29705 esrally.actor INFO Joining already running actor system with system base [multiprocTCPBase].
2020-04-09 02:27:54,618 ActorAddr-(T|:1900)/PID:29711 esrally.actor INFO Capabilities [{'coordinator': True, 'ip': '127.0.0.1', 'Convention Address.IPv4': '127.0.0.1:1900', 'Thespian ActorSystem Name': 'multiprocTCPBase', 'Thespian ActorSystem Version': 2, 'Thespian Watch Supported': True, 'Python Version': (3, 7, 6, 'final', 0), 'Thespian Generation': (3, 9), 'Thespian Version': '1586399274602'}] match requirements [{'coordinator': True}].
2020-04-09 02:27:54,618 ActorAddr-(T|:1900)/PID:29711 esrally.actor INFO Capabilities [{'coordinator': True, 'ip': '127.0.0.1', 'Convention Address.IPv4': '127.0.0.1:1900', 'Thespian ActorSystem Name': 'multiprocTCPBase', 'Thespian ActorSystem Version': 2, 'Thespian Watch Supported': True, 'Python Version': (3, 7, 6, 'final', 0), 'Thespian Generation': (3, 9), 'Thespian Version': '1586399274602'}] match requirements [{'coordinator': True}].
2020-04-09 02:27:55,6 ActorAddr-(T|:41637)/PID:29713 esrally.utils.process INFO From https://github.com/elastic/rally-tracks

[ec2-user@ip-172-30-2-208 logs] esrally --version esrally 1.4.1 [ec2-user@ip-172-30-2-208 logs]

Let me better explain what I was trying to do.

I have been trying to customize the code in the rally-eventdata-track. I need to change the behavior in method generate_event() inside randomevent.py. Rally seemed to allow this for a while, even though it kept giving me nag screen messages, such as:

[ec2-user@ip-172-30-2-208 ~]$ esrally --distribution-version=7.6.0 --track=eventdata --track-repository=eventdata --challenge=index-logs-fixed-daily-volume --track-params=./params-file.json

[WARNING] Local changes in [/home/ec2-user/.rally/benchmarks/tracks/eventdata] prevent tracks update from remote. Please commit your changes.
[INFO] Preparing for race ...
[WARNING] Local changes in [/home/ec2-user/.rally/benchmarks/tracks/eventdata] prevent tracks update from remote. Please commit your changes.

but recently it started giving me this error besides:

[ERROR] Cannot race. Error in track preparator (('Could not load component [eventdata]', None))

Perhaps the problem is that I don't know how to customize the Python code inside the rally-eventdata-track. I've just been modifying the actual running code inside

/home/ec2-user/.rally/benchmarks/tracks/eventdata/eventdata/parameter_sources

which is where it's being run from. Is this not allowed? Is there a better way to do this?

What are you looking to modify?

I need to make a copy of the test corpus for my own testing

You can find a pregenerated corpus in https://github.com/elastic/rally-tracks/blob/a31078e4eb53032a94a9604cbb2bc404cb02f7d5/eventdata/track.json#L15-L22

Without seeing the changes you've done it's hard to understand what's gone wrong. If you've forked the repo and make changes there you can paste a link so that we can see in more details.

If you want to work with the eventdata track and iterate doing ad-hoc changes without Rally complaining that the repo is mutated, you can checkout the Rally eventdata track (git clone https://github.com/elastic/rally-eventdata-track) to a dedicated directory and then use the Rally cli flag --track-path to specify the path to the directory.

1 Like

Thanks Dimitrios, this is very helpful!

I no longer get these warnings:
[WARNING] Local changes in [/home/ec2-user/.rally/benchmarks/tracks/eventdata] prevent tracks update from remote. Please commit your changes.

But I still run into errors. Such as this one:
[INFO] Preparing for race ...
[ERROR] Cannot race. Error in track preparator (('Could not load component [eventdata]', None))

The only thing I did, for instance here, was to add one line of code at the top of randomevent.py:
from json2xml import json2xml

Please provide the exact changes done to the track (e.g. with git diff). Otherwise it's not possible to figure things out.

Thanks, I figured out a workaround, so no worries here.

But another set of questions.
Here's the results for a Rally race I ran, using the eventdata-track (see below). I need specific answers for the following questions:

  1. How many records did we index?
  2. How long did it take to index them all?
  3. What queries did we run?
  4. How long did the queries take?
  1. Since Rally sets up and tears down Elasticsearch at the start/end of each Rally race, can we find out the actual indexing and search times minus the setup/teardown times?

Thanks,
David Marsh

[ec2-user@ip-172-30-2-208 ~]$ esrally --distribution-version=7.6.0 --track=eventdata --track-repository=eventdata --challenge=index-logs-fixed-daily-volume --track-params=./params-file.json     [25/1730]

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

[INFO] Preparing for race ...
[INFO] Racing on track [eventdata], challenge [index-logs-fixed-daily-volume] and car ['defaults'] with version [7.6.0].

Running delete-index-elasticlogs-*                                             [100% done]
Running delete-index-template                                                  [100% done]
Running create-index-template                                                  [100% done]
Running bulk-index-logs                                                        [100% done]

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|                                                         Metric |            Task |     Value |   Unit |
|---------------------------------------------------------------:|----------------:|----------:|-------:|
|                     Cumulative indexing time of primary shards |                 |   1553.84 |    min |
|             Min cumulative indexing time across primary shards |                 |   1553.84 |    min |
|          Median cumulative indexing time across primary shards |                 |   1553.84 |    min |
|             Max cumulative indexing time across primary shards |                 |   1553.84 |    min |
|            Cumulative indexing throttle time of primary shards |                 |         0 |    min |
|    Min cumulative indexing throttle time across primary shards |                 |         0 |    min |
| Median cumulative indexing throttle time across primary shards |                 |         0 |    min |
|    Max cumulative indexing throttle time across primary shards |                 |         0 |    min |
|                        Cumulative merge time of primary shards |                 |   564.785 |    min |
|                       Cumulative merge count of primary shards |                 |      4015 |        |
|                Min cumulative merge time across primary shards |                 |   564.785 |    min |
|             Median cumulative merge time across primary shards |                 |   564.785 |    min |
|                Max cumulative merge time across primary shards |                 |   564.785 |    min |
|               Cumulative merge throttle time of primary shards |                 |   269.399 |    min |
|       Min cumulative merge throttle time across primary shards |                 |   269.399 |    min |
|    Median cumulative merge throttle time across primary shards |                 |   269.399 |    min |
|       Max cumulative merge throttle time across primary shards |                 |   269.399 |    min |
|                      Cumulative refresh time of primary shards |                 |   15.5074 |    min |
|                     Cumulative refresh count of primary shards |                 |      3720 |        |
|              Min cumulative refresh time across primary shards |                 |   15.5074 |    min |
|           Median cumulative refresh time across primary shards |                 |   15.5074 |    min |
|              Max cumulative refresh time across primary shards |                 |   15.5074 |    min |
|                        Cumulative flush time of primary shards |                 |   13.5334 |    min |
|                       Cumulative flush count of primary shards |                 |       857 |        |
|                Min cumulative flush time across primary shards |                 |   13.5334 |    min |
|             Median cumulative flush time across primary shards |                 |   13.5334 |    min |
|                Max cumulative flush time across primary shards |                 |   13.5334 |    min |
|                                             Total Young Gen GC |                 |   532.843 |      s |
|                                               Total Old Gen GC |                 |    47.813 |      s |
|                                                     Store size |                 |   79.0253 |     GB |
|                                                  Translog size |                 | 0.0771942 |     GB |
|                                         Heap used for segments |                 |   170.779 |     MB |
|                                       Heap used for doc values |                 | 0.0350456 |     MB |
|                                            Heap used for terms |                 |   150.586 |     MB |
|                                            Heap used for norms |                 |         0 |     MB |
|                                           Heap used for points |                 |         0 |     MB |
|                                    Heap used for stored fields |                 |   20.1576 |     MB |
|                                                  Segment count |                 |        71 |        |
|                                                 Min Throughput | bulk-index-logs |    731.66 | docs/s |
|                                              Median Throughput | bulk-index-logs |   27602.1 | docs/s |
|                                                 Max Throughput | bulk-index-logs |   28069.8 | docs/s |
|                                        50th percentile latency | bulk-index-logs |    234.33 |     ms |
|                                        90th percentile latency | bulk-index-logs |   284.378 |     ms |
|                                        99th percentile latency | bulk-index-logs |   559.605 |     ms |
|                                      99.9th percentile latency | bulk-index-logs |   948.532 |     ms |
|                                     99.99th percentile latency | bulk-index-logs |   1203.05 |     ms |
|                                       100th percentile latency | bulk-index-logs |   2816.92 |     ms |
|                                   50th percentile service time | bulk-index-logs |    234.33 |     ms |
|                                   90th percentile service time | bulk-index-logs |   284.378 |     ms |
|                                   99th percentile service time | bulk-index-logs |   559.605 |     ms |
|                                 99.9th percentile service time | bulk-index-logs |   948.532 |     ms |
|                                99.99th percentile service time | bulk-index-logs |   1203.05 |     ms |
|                                  100th percentile service time | bulk-index-logs |   2816.92 |     ms |
|                                                     error rate | bulk-index-logs |         0 |      % |


-----------------------------------
[INFO] SUCCESS (took 15299 seconds)
-----------------------------------

First, you seem to be running Rally on the same machine where Elasticsearch runs: this is guaranteed to skew your results. Please take a look at the blog article for some tips on methodology: https://www.elastic.co/blog/seven-tips-for-better-elasticsearch-benchmarks

For most of the questions you asked the information can be found by configuring an Elasticsearch remote metrics store as documented in https://esrally.readthedocs.io/en/stable/configuration.html#advanced-configuration

Then you can inspect the rally-metrics where lots of records are stored throughout the execution of each task; details in https://esrally.readthedocs.io/en/stable/metrics.html

There is also a separate index rally-results containing summary metrics for various tasks.

For some of your questions:

  1. How many records did we index?

For most tracks using a static corpus (like the ones in github.com/elastic/rally-tracks) this is normally evident from the corpora definition e.g. in https://github.com/elastic/rally-tracks/blob/a31078e4eb53032a94a9604cbb2bc404cb02f7d5/geonames/track.json#L19

The Rally eventdata track (as you've probably realized since you've been attempting to modify the parameter source) is generating data on the fly and the specific channel index-logs-fixed-daily-volume aim to simulate ingesting a certain size of data per day (default here)

Using an Elasticsearch metrics store can help you there.
As an example I looked at the geonames results for one race and this is how it looks on Kibana:

so there are 2280 samples and I used this aggregation to calculate the sum of docs:
(note: the samples contain both warmup and normal samples)

POST /rally-metrics-2020-04/_search?size=0
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "operation-type": "Bulk"
          }
        },
        {
          "term": {
            "race-id": "96270b3b-3510-4520-883c-b2e7371913f7"
          }
        },
        {
          "term": {
            "track": "geonames"
          }
        },
        {
          "term": {
            "name": "service_time"
          }
        },
        {
          "term": {
            "meta.success": true
          }
        },
        {
          "term": {
            "meta.tag_license": "oss"
          }
        }
      ]
    }
  },
  "aggs" : {
        "doccount" : { "sum" : { "field" : "meta.success-count" } }
    }
}

returning:

{
  "took": 16,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2280,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "doccount": {
      "value": 11396503
    }
  }
}

(i.e.11396503 docs pushed) matching what we know about the corpora size of geonames https://github.com/elastic/rally-tracks/blob/a31078e4eb53032a94a9604cbb2bc404cb02f7d5/geonames/track.json#L19

You can use additional visualizations and similar aggregations to answer more questions.

Regarding queries, the specific challenge you ran doesn't include queries, it's indexing only.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.