Hi
I am using Rally's existing tracks to perform benchmarking. I noticed that nyc taxis is the largest track with 4.5GB compressed and 74.3 GB uncompressed docs. I want to test with larger data volume.
I have used the following trick..
the nyc_taxis document corpus ten times (note the index_count
variable at the top):
{% set index_count = 10 %}
{
"version": 2,
"description": "Taxi rides in New York in 2015",
"indices": [
{% set comma = joiner() %}
{% for item in range(index_count) %}
{{ comma() }}
{
"name": "nyc_taxis-{{item}}",
"body": "index.json",
"types": [ "type" ],
"auto-managed": false
}
{% endfor %}
],
"corpora": [
{
"name": "nyc_taxis",
"base-url": "http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/nyc_taxis",
"documents": [
{% set comma = joiner() %}
{% for item in range(index_count) %}
{{ comma() }}
i have referred below link for above trick.
My Concern is: when i used above trick .. the ES will have 10 different indices and those are running in sequence ..
how to make that indices to run in parallel so that it can utilize the CPU in an optimal way.