Increase data size in Rally existing tracks

Alp1 · January 22, 2018, 5:20pm

Hi Daniel ,

I am using Rally's existing tracks to perform benchmarking. I noticed that nyc taxis is the largest track with 4.5GB compressed and 74.3 GB uncompressed docs. I want to test with larger data volume. Is there any option provided in rally to duplicate or triplicate the data in existing tracks ?

danielmitterdorfer · January 23, 2018, 8:05am

Hi @Alp1,

you can apply a trick so Rally indexes the data into multiple indices but you need to create your own track for that. I suggest that you use the latest version (which is 0.9.1) because we introduced a concept of "document corpora" recently with Rally 0.9.0. This feature allows you to reuse document corpora from other tracks. Here is a complete example that bulk-indexes the nyc_taxis document corpus ten times (note the index_count variable at the top):

{% set index_count = 10 %}
{
  "version": 2,
  "description": "Taxi rides in New York in 2015",
  "indices": [
  {% set comma = joiner() %}
  {% for item in range(index_count) %}
  {{ comma() }}
    {
      "name": "nyc_taxis-{{item}}",
      "body": "index.json",
      "types": [ "type" ],
      "auto-managed": false
    }
  {% endfor %}
  ],
  "corpora": [
    {
      "name": "nyc_taxis",
      "base-url": "http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/nyc_taxis",
      "documents": [
      {% set comma = joiner() %}
      {% for item in range(index_count) %}
      {{ comma() }}
        {
          "target-index": "nyc_taxis-{{item}}",
          "target-type": "type",
          "source-file": "documents.json.bz2",
          "document-count": 165346692,
          "compressed-bytes": 4812721501,
          "uncompressed-bytes": 79802445255
        }
      {% endfor %}
      ]
    }
  ],
  "challenge": {
      "name": "bulk-index",
      "schedule": [
        {
          "operation": "delete-index"
        },
        {
          "operation": {
            "operation-type": "create-index",
            "settings": {
              "index.number_of_replicas": 0
            }
          }
        },
        {
          "name": "check-cluster-health",
          "operation": {
            "operation-type": "cluster-health",
            "index": "nyc_taxis-*",
            "request-params": {
              "wait_for_status": "{{cluster_health | default('green')}}",
              "wait_for_no_relocating_shards": "true"
            }
          }
        },
        {
          "operation": {
            "name": "index-append",
            "operation-type": "bulk",
            "bulk-size": {{bulk_size | default(10000)}}
          },
          "clients": 8,
          "warmup-time-period": 0
        },
        {
          "operation": "refresh",
          "clients": 1
        },
        {
          "operation": "force-merge",
          "clients": 1
        }
      ]
    }
}

Store this as e.g. nyc_taxis.json and run (e.g.) with esrally --distribution-version=6.1.1 --on-error=abort --track-path=/path/to/nyc_taxis.json but note that you also need to store the index definition from https://github.com/elastic/rally-tracks/blob/master/nyc_taxis/index.json in the same directory as nyc_taxis.json in order to make this work.

Alternatively you can also use the eventdata track from Christian Dahlqvist which uses generated data but allows you to create arbitrarily large indices.

Alp1 · January 23, 2018, 7:19pm

Perfect ..Thanks Daniel.
I will try this trick and will keep you posted.

system · February 20, 2018, 7:19pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Increase data size in Rally with existing tracks Elasticsearch rally	4	704	December 9, 2019
Multiple indices are indexing in sequence Elasticsearch rally	2	708	May 7, 2019
Benchmarking ES cluster using larger Rally dataset for multiple parallel indexing Elasticsearch rally	5	872	July 5, 2019
Increasing data size of existing track nyc_taxis Elasticsearch rally	3	652	May 5, 2020
Scalability issue - Rally benchmark on ES 7.0.1 Elasticsearch rally	7	1191	July 2, 2019

Increase data size in Rally existing tracks

Related topics