Increasing data size of existing track nyc_taxis

To create large data size of as per this suggestion. I created a new custom track folder and set the index count as 5, also copied all the other files & directories from default "nyc_taxis" under this new track folder and ran but get below error. This suggestion was in 2018 is this trick valid in Rally latest 1.4.0 ? Any pointers or suggestions ?

Command ran:

esrally --distribution-version=7.5.2 --target-hosts=192.168.20.4:39200,192.168.20.169:39200 --on-error=abort --track-path=~/.rally/benchmarks/tracks/default/nyc_taxis_many/track.json

error

2020-03-23 21:29:41,826 ActorAddr-(T|:46634)/PID:8306 esrally.utils.modules DEBUG Adding [/home/elastic/.rally/benchmarks/tracks/default] to Python load path.
2020-03-23 21:29:41,826 ActorAddr-(T|:46634)/PID:8306 esrally.utils.modules DEBUG Loading module [nyc_taxis_many.track]
2020-03-23 21:29:41,838 ActorAddr-(T|:46634)/PID:8306 esrally.driver.runner DEBUG Registering runner function [<function wait_for_ml_lookback at 0x7f295ded2bf8>] for [wait-for-ml-lookback].
2020-03-23 21:29:41,839 ActorAddr-(T|:46634)/PID:8306 esrally.actor ERROR Error in track preparator
Traceback (most recent call last):

  File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/esrally/actor.py", line 85, in guard
    return f(self, msg, sender)

  File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/esrally/driver/driver.py", line 331, in receiveMsg_PrepareTrack
    track.prepare_track(msg.track, cfg)

  File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/esrally/track/loader.py", line 345, in prepare_track
    for corpus in used_corpora(t, cfg):

  File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/esrally/track/loader.py", line 327, in used_corpora
    param_source = operation_parameters(t, sub_task)

  File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/esrally/track/loader.py", line 318, in operation_parameters
    return params.param_source_for_operation(op.type, t, op.params, task.name)

  File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/esrally/track/params.py", line 38, in param_source_for_operation
    return __PARAM_SOURCES_BY_OP[op_type](track, params, operation_name=task_name)

  File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/esrally/track/params.py", line 368, in __init__
    raise exceptions.InvalidSyntax("'index' is mandatory and is missing for operation '{}'".format(kwargs.get("operation_name")))

esrally.exceptions.InvalidSyntax: ("'index' is mandatory and is missing for operation 'default'", None)

track.json (Updated index count to 5)

{% import "rally.helpers" as rally with context %}
    {% set index_count = 5 %}
    {
      "version": 2,
      "description": "Taxi rides in New York in 2015",
      "indices": [
      {% set comma = joiner() %}
      {% for item in range(index_count) %}
      {{ comma() }}
        {
          "name": "nyc_taxis-{{item}}",
          "body": "index.json",
          "types": [ "type" ],
          "auto-managed": false
        }
      {% endfor %}
      ],
      "corpora": [
        {
          "name": "nyc_taxis",
          "base-url": "http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/nyc_taxis",
          "documents": [
          {% set comma = joiner() %}
          {% for item in range(index_count) %}
          {{ comma() }}
            {
              "target-index": "nyc_taxis-{{item}}",
              "target-type": "type",
              "source-file": "documents.json.bz2",
              "#COMMENT": "ML benchmark rely on the fact that the document count stays constant.",
              "document-count": 165346692,
              "compressed-bytes": 4812721501,
              "uncompressed-bytes": 79802445255
           }
         {% endfor %}
          ]
        }
      ],
      "operations": [
        {{ rally.collect(parts="operations/*.json") }}
      ],
      "challenges": [
        {{ rally.collect(parts="challenges/*.json") }}
      ]
    }

Hi,

the relevant bit of the error message is:

Rally attempts to derive the target index for the task with the name default but it only does that if there is one index (in your case you have five). If there are multiple, you have to provide the index name explicitly (you can use _all if you want to target all indices). See our docs for details.

Daniel

Thanks Daniel will try your suggestion

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.