Increasing data size of existing track nyc_taxis

ven100 · March 23, 2020, 11:03pm

To create large data size of as per this suggestion. I created a new custom track folder and set the index count as 5, also copied all the other files & directories from default "nyc_taxis" under this new track folder and ran but get below error. This suggestion was in 2018 is this trick valid in Rally latest 1.4.0 ? Any pointers or suggestions ?

Command ran:

esrally --distribution-version=7.5.2 --target-hosts=192.168.20.4:39200,192.168.20.169:39200 --on-error=abort --track-path=~/.rally/benchmarks/tracks/default/nyc_taxis_many/track.json

error

2020-03-23 21:29:41,826 ActorAddr-(T|:46634)/PID:8306 esrally.utils.modules DEBUG Adding [/home/elastic/.rally/benchmarks/tracks/default] to Python load path.
2020-03-23 21:29:41,826 ActorAddr-(T|:46634)/PID:8306 esrally.utils.modules DEBUG Loading module [nyc_taxis_many.track]
2020-03-23 21:29:41,838 ActorAddr-(T|:46634)/PID:8306 esrally.driver.runner DEBUG Registering runner function [<function wait_for_ml_lookback at 0x7f295ded2bf8>] for [wait-for-ml-lookback].
2020-03-23 21:29:41,839 ActorAddr-(T|:46634)/PID:8306 esrally.actor ERROR Error in track preparator
Traceback (most recent call last):

  File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/esrally/actor.py", line 85, in guard
    return f(self, msg, sender)

  File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/esrally/driver/driver.py", line 331, in receiveMsg_PrepareTrack
    track.prepare_track(msg.track, cfg)

  File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/esrally/track/loader.py", line 345, in prepare_track
    for corpus in used_corpora(t, cfg):

  File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/esrally/track/loader.py", line 327, in used_corpora
    param_source = operation_parameters(t, sub_task)

  File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/esrally/track/loader.py", line 318, in operation_parameters
    return params.param_source_for_operation(op.type, t, op.params, task.name)

  File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/esrally/track/params.py", line 38, in param_source_for_operation
    return __PARAM_SOURCES_BY_OP[op_type](track, params, operation_name=task_name)

  File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/esrally/track/params.py", line 368, in __init__
    raise exceptions.InvalidSyntax("'index' is mandatory and is missing for operation '{}'".format(kwargs.get("operation_name")))

esrally.exceptions.InvalidSyntax: ("'index' is mandatory and is missing for operation 'default'", None)

track.json (Updated index count to 5)

{% import "rally.helpers" as rally with context %}
    {% set index_count = 5 %}
    {
      "version": 2,
      "description": "Taxi rides in New York in 2015",
      "indices": [
      {% set comma = joiner() %}
      {% for item in range(index_count) %}
      {{ comma() }}
        {
          "name": "nyc_taxis-{{item}}",
          "body": "index.json",
          "types": [ "type" ],
          "auto-managed": false
        }
      {% endfor %}
      ],
      "corpora": [
        {
          "name": "nyc_taxis",
          "base-url": "http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/nyc_taxis",
          "documents": [
          {% set comma = joiner() %}
          {% for item in range(index_count) %}
          {{ comma() }}
            {
              "target-index": "nyc_taxis-{{item}}",
              "target-type": "type",
              "source-file": "documents.json.bz2",
              "#COMMENT": "ML benchmark rely on the fact that the document count stays constant.",
              "document-count": 165346692,
              "compressed-bytes": 4812721501,
              "uncompressed-bytes": 79802445255
           }
         {% endfor %}
          ]
        }
      ],
      "operations": [
        {{ rally.collect(parts="operations/*.json") }}
      ],
      "challenges": [
        {{ rally.collect(parts="challenges/*.json") }}
      ]
    }

danielmitterdorfer · March 27, 2020, 7:03am

Hi,

the relevant bit of the error message is:

Rally attempts to derive the target index for the task with the name default but it only does that if there is one index (in your case you have five). If there are multiple, you have to provide the index name explicitly (you can use _all if you want to target all indices). See our docs for details.

Daniel

ven100 · April 7, 2020, 9:05pm

Thanks Daniel will try your suggestion

system · May 5, 2020, 9:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Increase data size in Rally with existing tracks Elasticsearch rally	4	703	December 9, 2019
Increase data size in Rally existing tracks Elasticsearch rally	3	2412	February 20, 2018
Benchmarking ES cluster using larger Rally dataset for multiple parallel indexing Elasticsearch rally	5	871	July 5, 2019
Error in track preparator ([Errno 28] No space left on device) Elasticsearch rally	3	972	November 14, 2018
Multiple indices are indexing in sequence Elasticsearch rally	2	707	May 7, 2019

Increasing data size of existing track nyc_taxis

Related topics