Hi @Alp1,
you can apply a trick so Rally indexes the data into multiple indices but you need to create your own track for that. I suggest that you use the latest version (which is 0.9.1) because we introduced a concept of "document corpora" recently with Rally 0.9.0. This feature allows you to reuse document corpora from other tracks. Here is a complete example that bulk-indexes the nyc_taxis document corpus ten times (note the index_count
variable at the top):
{% set index_count = 10 %}
{
"version": 2,
"description": "Taxi rides in New York in 2015",
"indices": [
{% set comma = joiner() %}
{% for item in range(index_count) %}
{{ comma() }}
{
"name": "nyc_taxis-{{item}}",
"body": "index.json",
"types": [ "type" ],
"auto-managed": false
}
{% endfor %}
],
"corpora": [
{
"name": "nyc_taxis",
"base-url": "http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/nyc_taxis",
"documents": [
{% set comma = joiner() %}
{% for item in range(index_count) %}
{{ comma() }}
{
"target-index": "nyc_taxis-{{item}}",
"target-type": "type",
"source-file": "documents.json.bz2",
"document-count": 165346692,
"compressed-bytes": 4812721501,
"uncompressed-bytes": 79802445255
}
{% endfor %}
]
}
],
"challenge": {
"name": "bulk-index",
"schedule": [
{
"operation": "delete-index"
},
{
"operation": {
"operation-type": "create-index",
"settings": {
"index.number_of_replicas": 0
}
}
},
{
"name": "check-cluster-health",
"operation": {
"operation-type": "cluster-health",
"index": "nyc_taxis-*",
"request-params": {
"wait_for_status": "{{cluster_health | default('green')}}",
"wait_for_no_relocating_shards": "true"
}
}
},
{
"operation": {
"name": "index-append",
"operation-type": "bulk",
"bulk-size": {{bulk_size | default(10000)}}
},
"clients": 8,
"warmup-time-period": 0
},
{
"operation": "refresh",
"clients": 1
},
{
"operation": "force-merge",
"clients": 1
}
]
}
}
Store this as e.g. nyc_taxis.json
and run (e.g.) with esrally --distribution-version=6.1.1 --on-error=abort --track-path=/path/to/nyc_taxis.json
but note that you also need to store the index definition from https://github.com/elastic/rally-tracks/blob/master/nyc_taxis/index.json in the same directory as nyc_taxis.json
in order to make this work.
Alternatively you can also use the eventdata track from Christian Dahlqvist which uses generated data but allows you to create arbitrarily large indices.