Bulk is too slow

How can I use my data? Is there an easy way to use my own data?

I try to modify .rally/benchmarks/tracks/default/geonames/track.json and change the value of mapping and documents to my mapping and data file. Then I run esrally --distribution-version=5.6.2

But an git error occur:

Traceback (most recent call last):
File "/usr/lib/python3.4/site-packages/esrally/rally.py", line 449, in dispatch_sub_command
race(cfg)
File "/usr/lib/python3.4/site-packages/esrally/rally.py", line 383, in race
with_actor_system(lambda c: racecontrol.run(c), cfg)
File "/usr/lib/python3.4/site-packages/esrally/rally.py", line 403, in with_actor_system
runnable(cfg)
File "/usr/lib/python3.4/site-packages/esrally/rally.py", line 383, in
with_actor_system(lambda c: racecontrol.run(c), cfg)
File "/usr/lib/python3.4/site-packages/esrally/racecontrol.py", line 382, in run
raise e
File "/usr/lib/python3.4/site-packages/esrally/racecontrol.py", line 379, in run
pipeline(cfg)
File "/usr/lib/python3.4/site-packages/esrally/racecontrol.py", line 63, in call
self.target(cfg)
File "/usr/lib/python3.4/site-packages/esrally/racecontrol.py", line 319, in from_distribution
return race(cfg, distribution=True)
File "/usr/lib/python3.4/site-packages/esrally/racecontrol.py", line 281, in race
raise exceptions.RallyError(result.message, result.cause)
esrally.exceptions.RallyError: ('Could not execute benchmark', Cannot update tracks in [/home/f4ct0r/.rally/benchmarks/tracks/default] (Could not checkout branch [5]. Do you have uncommitted changes?).)

Hi @f4ct0r,

I would not recommend changing the built-in tracks but instead start with a new track. There is a complete tutorial in the docs.

You basically need three things:

  1. A data file which contains your test data set in a bulk-friendly format (i.e. one line per document)
  2. A mapping file for your index
  3. A track.json file that describes the workload. If you are just interested in measuring indexing throughput, something along these lines should get you started:
{
  "short-description": "Example track",
  "description": "This track shows how to index some documents with Rally",
  "indices": [
    {
      "name": "your-index-name-here",
      "types": [
        {
          "name": "your-type-name-here",
          "mapping": "your-mapping-file-here.json",
          "documents": "your-documents-file-here.json",
          "document-count": 1000000,
          "uncompressed-bytes": 9999999999
        }
      ]
    }
  ],
  "operations": [
    {
      "name": "index-docs",
      "operation-type": "index",
      "bulk-size": 5000
    }
  ],
  "challenges": [
    {
      "name": "index-only",
      "description": "",
      "default": true,
      "index-settings": {
        "index.number_of_replicas": 0
      },
      "schedule": [
        {
          "operation": "index-docs",
          "warmup-time-period": 120,
          "clients": 8
        }
      ]
    }
  ]
}

Please adjust the values (especially document-count and uncompressed-bytes) accordingly.

For details please refer to the docs, especially the tutorial that I've mentioned above.

Finally, you should revert your changes in ~/.rally/benchmarks/tracks/default.

Daniel

1 Like

You would probably want to increase this to match your current benchmark though.

Hi @danielmitterdorfer, thanks for reply.

At first, I wrote a track.json based on complete tutorial in the docs, but I think that example could not be JsonDecode, and missing operations key between indices and challenges. Or maybe you forget to update that example?

Anyway, I wrote track.json in right way finally, here is my track.json:

{
  "short-description": "My benchmark for Rally",
  "description": "This test indexes 8.6 million documents (POIs from Geonames) using 50 clients and 1000 docs per bulk request against Elasticsearch",
  "indices": [
    {
      "name": "protocol",
      "types": [
        {
          "name": "http",
          "mapping": "mappings.json",
          "documents": "documents.json",
          "document-count": 10000,
          "uncompressed-bytes": 134190000
        }
      ]
    }
  ],
  "operations": [
    {
      "name": "http-index-append",
      "operation-type": "index",
      "bulk-size": 1000
    }
  ],
  "challenges": [
    {
      "name": "http-index",
      "description": "aaaa",
      "default": true,
      "index-settings": {
        "index.number_of_replicas": 0
      },
      "schedule": [
        {
          "operation": "http-index-append",
          "warmup-time-period": 120,
          "clients": 20
        }
      ]
    }
  ]
}

And here is the report output:

|   Lap |                      Metric |         Operation |       Value |   Unit |
|------:|----------------------------:|------------------:|------------:|-------:|
|   All |               Indexing time |                   |     1.00885 |    min |
|   All |                  Merge time |                   |   0.0383167 |    min |
|   All |                Refresh time |                   |   0.0395333 |    min |
|   All |            Median CPU usage |                   |          23 |      % |
|   All |          Total Young Gen GC |                   |       0.222 |      s |
|   All |            Total Old Gen GC |                   |       0.196 |      s |
|   All |                  Index size |                   |   0.0539314 |     GB |
|   All |             Totally written |                   |    0.295757 |     GB |
|   All |      Heap used for segments |                   |    0.371875 |     MB |
|   All |    Heap used for doc values |                   |  0.00157928 |     MB |
|   All |         Heap used for terms |                   |    0.323584 |     MB |
|   All |         Heap used for norms |                   |   0.0340576 |     MB |
|   All |        Heap used for points |                   | 0.000256538 |     MB |
|   All | Heap used for stored fields |                   |   0.0123978 |     MB |
|   All |               Segment count |                   |          18 |        |
|   All |                  error rate | http-index-append |           0 |      % |

Use this track.json, I can't see the rate of index. Am I missing some configuration? Maybe target-throughput?

I have wrote track.json and run command esrally --distribution-version=5.6.2 --track-path=~/tracks --car="8gheap" successfully.

But I don't find the index rate, am I missing something? Like target-throughput option?

Here is the report output:

|   Lap |                      Metric |         Operation |       Value |   Unit |
|------:|----------------------------:|------------------:|------------:|-------:|
|   All |               Indexing time |                   |     1.00885 |    min |
|   All |                  Merge time |                   |   0.0383167 |    min |
|   All |                Refresh time |                   |   0.0395333 |    min |
|   All |            Median CPU usage |                   |          23 |      % |
|   All |          Total Young Gen GC |                   |       0.222 |      s |
|   All |            Total Old Gen GC |                   |       0.196 |      s |
|   All |                  Index size |                   |   0.0539314 |     GB |
|   All |             Totally written |                   |    0.295757 |     GB |
|   All |      Heap used for segments |                   |    0.371875 |     MB |
|   All |    Heap used for doc values |                   |  0.00157928 |     MB |
|   All |         Heap used for terms |                   |    0.323584 |     MB |
|   All |         Heap used for norms |                   |   0.0340576 |     MB |
|   All |        Heap used for points |                   | 0.000256538 |     MB |
|   All | Heap used for stored fields |                   |   0.0123978 |     MB |
|   All |               Segment count |                   |          18 |        |
|   All |                  error rate | http-index-append |           0 |      % |

Hi @Christian_Dahlqvist @Christian_Dahlqvist, I found Track elements and found why there is no Throughput output.

Then I modify my track.json, and the report is:

|   Lap |                        Metric |         Operation |      Value |   Unit |
|------:|------------------------------:|------------------:|-----------:|-------:|
|   All |                 Indexing time |                   |    7.47383 |    min |
|   All |                    Merge time |                   |    0.31495 |    min |
|   All |                  Refresh time |                   |   0.262683 |    min |
|   All |              Median CPU usage |                   |      887.5 |      % |
|   All |            Total Young Gen GC |                   |      1.088 |      s |
|   All |              Total Old Gen GC |                   |      0.263 |      s |
|   All |        Heap used for segments |                   |    1.21115 |     MB |
|   All |      Heap used for doc values |                   | 0.00482559 |     MB |
|   All |           Heap used for terms |                   |    1.00723 |     MB |
|   All |           Heap used for norms |                   |   0.104065 |     MB |
|   All |          Heap used for points |                   | 0.00213337 |     MB |
|   All |   Heap used for stored fields |                   |  0.0928879 |     MB |
|   All |                 Segment count |                   |         55 |        |
|   All |                Min Throughput | http-index-append |     212.75 | docs/s |
|   All |             Median Throughput | http-index-append |       2763 | docs/s |
|   All |                Max Throughput | http-index-append |    3325.69 | docs/s |
|   All |       50th percentile latency | http-index-append |    5059.04 |     ms |
|   All |       90th percentile latency | http-index-append |    6972.22 |     ms |
|   All |       99th percentile latency | http-index-append |    10028.9 |     ms |
|   All |      100th percentile latency | http-index-append |    10490.9 |     ms |
|   All |  50th percentile service time | http-index-append |    5059.04 |     ms |
|   All |  90th percentile service time | http-index-append |    6972.22 |     ms |
|   All |  99th percentile service time | http-index-append |    10028.9 |     ms |
|   All | 100th percentile service time | http-index-append |    10490.9 |     ms |
|   All |                    error rate | http-index-append |          0 |      % |

The Max Throughput is also low to me, is there any way to improve index rate?

Rally processes all data provided in the documents.json file once and then stops. As the number of documents there is only 10000 and you bulk size is 1000, that will only be enough to generate 10 bulk requests, which is far to little. I would recommend duplicating the contents of the file so you get a much larger file with at least a few million documents. Once you have enough data to allow the benchmark to run for a while, you will be able to easily change the number of clients and see how it affects performance.

Hi @f4ct0r

Note that Rally's documentation is versioned. There are two pseudo-versions:

  • stable: This is the documentation for the current release (which is 0.7.4 at the moment).
  • latest: This is the documentation for the release that is currently in development (i.e. the master branch of Rally, which is 0.8.0.dev0 at the moment)

Rally 0.8.0 will bring several simplifications to the track file format, one of them being that you can define operations inline in the challenge instead of defining a separate block. To me that sounds as if you were looking at the latest (i.e. development) docs instead of the stable (i.e. release) docs.

However, glad that you managed to create your first track. As Christian has already mentioned, the document corpus is way too small and you should increase its size. Afterwards, you can vary the number of clients, bulk size and further index settings.

Btw, In your original post you mentioned the following index settings:

"number_of_shards": 5,
"number_of_replicas": 0,
"translog.flush_threshold_size": "1gb",
"translog.durability": "async"

In order to have them used in your track, you need to specify them in the challenge:

{
  "challenges": [
    {
      "name": "http-index",
      "description": "aaaa",
      "default": true,
      "index-settings": {
        "index.number_of_shards": 5,
        "index.number_of_replicas": 0,
        "index.translog.flush_threshold_size": "1gb",
        "index.translog.durability": "async"        
      },
      "schedule": [
        ...
      ]
    }
  ]
}

Daniel

I increased the document count to 1000000 and size to 12G

And according to @danielmitterdorfer Daniel mentioned, I specify these in challenges:

"index.number_of_shards": 5,
"index.number_of_replicas": 0,
"index.translog.flush_threshold_size": "1gb",
"index.translog.durability": "async"

After I set these and retest again, I find that index rate is lower than before and is also lower than my benchmark script, I don't know why.

Here is the lasted report:

|   Lap |                         Metric |         Operation |     Value |   Unit |
|------:|-------------------------------:|------------------:|----------:|-------:|
|   All |                  Indexing time |                   |   209.433 |    min |
|   All |                     Merge time |                   |   25.8112 |    min |
|   All |                   Refresh time |                   |   18.7837 |    min |
|   All |                     Flush time |                   |   21.1096 |    min |
|   All |            Merge throttle time |                   |    0.7682 |    min |
|   All |               Median CPU usage |                   |     219.8 |      % |
|   All |             Total Young Gen GC |                   |     8.058 |      s |
|   All |               Total Old Gen GC |                   |     1.223 |      s |
|   All |         Heap used for segments |                   |   3.87121 |     MB |
|   All |       Heap used for doc values |                   | 0.0125465 |     MB |
|   All |            Heap used for terms |                   |   2.83058 |     MB |
|   All |            Heap used for norms |                   |  0.270569 |     MB |
|   All |           Heap used for points |                   | 0.0161858 |     MB |
|   All |    Heap used for stored fields |                   |  0.741325 |     MB |
|   All |                  Segment count |                   |       143 |        |
|   All |                 Min Throughput | http-index-append |    290.58 | docs/s |
|   All |              Median Throughput | http-index-append |   1238.01 | docs/s |
|   All |                 Max Throughput | http-index-append |   2041.33 | docs/s |
|   All |        50th percentile latency | http-index-append |   10027.2 |     ms |
|   All |        90th percentile latency | http-index-append |   30680.4 |     ms |
|   All |        99th percentile latency | http-index-append |   77282.4 |     ms |
|   All |      99.9th percentile latency | http-index-append |    124735 |     ms |
|   All |       100th percentile latency | http-index-append |    140962 |     ms |
|   All |   50th percentile service time | http-index-append |   10027.2 |     ms |
|   All |   90th percentile service time | http-index-append |   30680.4 |     ms |
|   All |   99th percentile service time | http-index-append |   77282.4 |     ms |
|   All | 99.9th percentile service time | http-index-append |    124735 |     ms |
|   All |  100th percentile service time | http-index-append |    140962 |     ms |
|   All |                     error rate | http-index-append |         0 |      % |

Did you increase the number of clients? If so, how many did you use?

1000000 documents gives just 1000 bulk requests. If you have 50 clients that is just 20 requests per client, so I would make sure you use an even larger amount of data.

You also have a very long warmup period, which may mean that most of your requests do not get included. Set this to 0 and try again.

Hi,

to add to Christian's observations:

I assume that you ran it with esrally --distribution-version=5.6.2 --track-path=~/tracks --car="8gheap" just as before. This means that Rally will spin up an Elasticsearch node on your local machine. Therefore, Rally and Elasticsearch compete for resources (CPU, disk, RAM).

You have two options to avoid this:

  • You setup and start Elasticsearch yourself on another machine and use Rally just as a load generator. You can achieve this with esrally --track-path=~/track --pipeline=benchmark-only --target-host="ES_NODE_IP:9200" where you replace ES_NODE_IP with the IP address of the machine that you start Elasticsearch.
  • You let Rally setup and start Elasticsearch on the remote machine for you. This is doable (see docs) but I think this is much more complex so I'd advise you to use the other option.

What's also a bit odd is that it just uses around 220% CPU (the maximum is 100% * number of CPU cores). Does your system have a spinning disk? The disk where your benchmark data file is on, should really be an SSD because each client will read a portion of the data file. This results in random disk accesses and spinning disks will very likely cause an accidental client-side bottleneck.

Daniel

1 Like

Nope, I did not increase clients' number. I set clients' number 20 all the time, because I run 20 threads in my real circumstance.

And I have already delete warmup-time-period option in my track.json before I retest, so the report I reply just now didn't have that option.

Hi Daniel,

I ran esrally --distribution-version=5.6.2 --track-path=~/tracks --car="8gheap" this command on a server, and in this server I had already installed ElasticSearch to test performance. Are there any problems?

And my disk is HDD instead of SSD. When I ran Rally, the disk write rate can reach the upper limit, but not always the upper limit.

Here is a part of iostat report:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          21.12    0.00    1.16   49.52    0.00   28.21

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda             703.00     26224.00     34940.00      26224      34940
dm-0            696.00     24704.00     40324.00      24704      40324
dm-1              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.87    0.00    0.47   50.23    0.00   42.44

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda             602.00      4424.00     87867.50       4424      87867
dm-0            638.00      4424.00     72343.50       4424      72343
dm-1              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.55    0.00    0.35   56.10    0.00   40.00

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda             115.00      1092.00     17007.00       1092      17007
dm-0            103.00      1092.00     27791.00       1092      27791
dm-1              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.07    0.00    0.46   58.54    0.00   33.93

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda             539.00      4732.00     92196.50       4732      92196
dm-0            571.00      4864.00     92268.50       4864      92268
dm-1              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.52    0.00    0.47   61.59    0.00   28.43

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda             319.00      7152.00     45803.00       7152      45803
dm-0            325.00      7876.00     17943.00       7876      17943
dm-1              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.84    0.00    0.73   79.48    0.00   12.95

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda             216.00      1852.00     10724.00       1852      10724
dm-0            263.00      1444.00     11564.00       1444      11564
dm-1              0.00         0.00         0.00          0          0

Hi,

yes. As I said, you are generating load (Rally) and consuming load (Elasticsearch) on the same hardware. They compete for resources such as disk I/O, CPU and memory. Thus your results may be skewed.

You should run Rally on a separate machine for more accurate results (see my previous post for more details). Also, it is crucial that your data file resides on an SSD, especially with that many clients as I've explained before.

Daniel

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.