Increase data size in Rally existing tracks

danielmitterdorfer · January 23, 2018, 8:05am

you can apply a trick so Rally indexes the data into multiple indices but you need to create your own track for that. I suggest that you use the latest version (which is 0.9.1) because we introduced a concept of "document corpora" recently with Rally 0.9.0. This feature allows you to reuse document corpora from other tracks. Here is a complete example that bulk-indexes the nyc_taxis document corpus ten times (note the index_count variable at the top):

{% set index_count = 10 %}
{
  "version": 2,
  "description": "Taxi rides in New York in 2015",
  "indices": [
  {% set comma = joiner() %}
  {% for item in range(index_count) %}
  {{ comma() }}
    {
      "name": "nyc_taxis-{{item}}",
      "body": "index.json",
      "types": [ "type" ],
      "auto-managed": false
    }
  {% endfor %}
  ],
  "corpora": [
    {
      "name": "nyc_taxis",
      "base-url": "http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/nyc_taxis",
      "documents": [
      {% set comma = joiner() %}
      {% for item in range(index_count) %}
      {{ comma() }}
        {
          "target-index": "nyc_taxis-{{item}}",
          "target-type": "type",
          "source-file": "documents.json.bz2",
          "document-count": 165346692,
          "compressed-bytes": 4812721501,
          "uncompressed-bytes": 79802445255
        }
      {% endfor %}
      ]
    }
  ],
  "challenge": {
      "name": "bulk-index",
      "schedule": [
        {
          "operation": "delete-index"
        },
        {
          "operation": {
            "operation-type": "create-index",
            "settings": {
              "index.number_of_replicas": 0
            }
          }
        },
        {
          "name": "check-cluster-health",
          "operation": {
            "operation-type": "cluster-health",
            "index": "nyc_taxis-*",
            "request-params": {
              "wait_for_status": "{{cluster_health | default('green')}}",
              "wait_for_no_relocating_shards": "true"
            }
          }
        },
        {
          "operation": {
            "name": "index-append",
            "operation-type": "bulk",
            "bulk-size": {{bulk_size | default(10000)}}
          },
          "clients": 8,
          "warmup-time-period": 0
        },
        {
          "operation": "refresh",
          "clients": 1
        },
        {
          "operation": "force-merge",
          "clients": 1
        }
      ]
    }
}

Store this as e.g. nyc_taxis.json and run (e.g.) with esrally --distribution-version=6.1.1 --on-error=abort --track-path=/path/to/nyc_taxis.json but note that you also need to store the index definition from https://github.com/elastic/rally-tracks/blob/master/nyc_taxis/index.json in the same directory as nyc_taxis.json in order to make this work.

Alternatively you can also use the eventdata track from Christian Dahlqvist which uses generated data but allows you to create arbitrarily large indices.

Topic		Replies	Views
Increase data size in Rally with existing tracks Elasticsearch rally	4	703	December 9, 2019
Multiple indices are indexing in sequence Elasticsearch rally	2	707	May 7, 2019
Benchmarking ES cluster using larger Rally dataset for multiple parallel indexing Elasticsearch rally	5	871	July 5, 2019
Increasing data size of existing track nyc_taxis Elasticsearch rally	3	652	May 5, 2020
Scalability issue - Rally benchmark on ES 7.0.1 Elasticsearch rally	7	1191	July 2, 2019

Increase data size in Rally existing tracks

Related topics