Need Help With Analyzing Rally Benchmark Report

Hey,... I'm just installing our new Elasticsearch cluster which hat the following hardware specs:

5 HP DL360 G10 nodes, each node is equipped with:

  • 64GB RAM
  • 2x XEON Silver 4216 CPU
  • 1x HP 1.6TB NVMe SSD (hot storage)
  • 10x 2.4TB 10k SAS HDD as RAID6 (warm storage)
  • 2x 10 SFP+ Network (active/standby bond)

We will probably use docker swarm to distribute 10 "virtual" nodes. 5 using the hot storage and 5 using the warm storage. The cluster is now running with default settings and no data stored in it. I just wanted to take a quick look how it performs without doing any special "tuning" by using rally.

I'm not sure if I'm interpreting the report the right way, but as the Min/Median/Max throughput metrics are all showing all the same value, I come to the conclusion that the client is not fast enough? Otherwise I think the performance of the cluster would be very poor - what do you think?

esrally --track=http_logs --test-mode --pipeline=benchmark-only --target-hosts=elastic01:9200,elastic02:9200,elastic03:9200,elastic04:9200,elastic05:9200 --client-options="use_ssl:true,verify_certs:false,basic_auth_user:'admin',basic_auth_password:'admin'"

.

[INFO] Racing on track [http_logs], challenge [append-no-conflicts] and car ['external'] with version [7.3.2].

Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running check-cluster-health                                                   [100% done]
Running index-append                                                           [100% done]
Running refresh-after-index                                                    [100% done]
Running force-merge                                                            [100% done]
Running refresh-after-force-merge                                              [100% done]
Running default                                                                [100% done]
Running term                                                                   [100% done]
Running range                                                                  [100% done]
Running hourly_agg                                                             [100% done]
Running scroll                                                                 [100% done]
Running desc_sort_timestamp                                                    [100% done]
Running asc_sort_timestamp                                                     [100% done]
Running force-merge-1-seg                                                      [100% done]
Running refresh-after-force-merge-1-seg                                        [100% done]
Running desc-sort-timestamp-after-force-merge-1-seg                            [100% done]
Running asc-sort-timestamp-after-force-merge-1-seg                             [100% done]

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|                        Metric |                                        Task |   Value |    Unit |
|------------------------------:|--------------------------------------------:|--------:|--------:|
|            Total Young Gen GC |                                             |       0 |       s |
|              Total Old Gen GC |                                             |       0 |       s |
|                Min Throughput |                                index-append | 15195.1 |  docs/s |
|             Median Throughput |                                index-append | 15195.1 |  docs/s |
|                Max Throughput |                                index-append | 15195.1 |  docs/s |
|       50th percentile latency |                                index-append | 48.1909 |      ms |
|       90th percentile latency |                                index-append | 75.1623 |      ms |
|      100th percentile latency |                                index-append | 100.025 |      ms |
|  50th percentile service time |                                index-append | 48.1909 |      ms |
|  90th percentile service time |                                index-append | 75.1623 |      ms |
| 100th percentile service time |                                index-append | 100.025 |      ms |
|                    error rate |                                index-append |       0 |       % |
|                Min Throughput |                                     default |   34.08 |   ops/s |
|             Median Throughput |                                     default |   34.08 |   ops/s |
|                Max Throughput |                                     default |   34.08 |   ops/s |
|      100th percentile latency |                                     default | 20.7477 |      ms |
| 100th percentile service time |                                     default | 20.7477 |      ms |
|                    error rate |                                     default |       0 |       % |
|                Min Throughput |                                        term |   30.48 |   ops/s |
|             Median Throughput |                                        term |   30.48 |   ops/s |
|                Max Throughput |                                        term |   30.48 |   ops/s |
|      100th percentile latency |                                        term | 28.5669 |      ms |
| 100th percentile service time |                                        term | 28.5669 |      ms |
|                    error rate |                                        term |       0 |       % |
|                Min Throughput |                                       range |    39.4 |   ops/s |
|             Median Throughput |                                       range |    39.4 |   ops/s |
|                Max Throughput |                                       range |    39.4 |   ops/s |
|      100th percentile latency |                                       range | 29.8074 |      ms |
| 100th percentile service time |                                       range | 29.8074 |      ms |
|                    error rate |                                       range |       0 |       % |
|                Min Throughput |                                  hourly_agg |   23.03 |   ops/s |
|             Median Throughput |                                  hourly_agg |   23.03 |   ops/s |
|                Max Throughput |                                  hourly_agg |   23.03 |   ops/s |
|      100th percentile latency |                                  hourly_agg | 44.4254 |      ms |
| 100th percentile service time |                                  hourly_agg | 44.4254 |      ms |
|                    error rate |                                  hourly_agg |       0 |       % |
|                Min Throughput |                                      scroll |    22.2 | pages/s |
|             Median Throughput |                                      scroll |    22.2 | pages/s |
|                Max Throughput |                                      scroll |    22.2 | pages/s |
|      100th percentile latency |                                      scroll | 319.724 |      ms |
| 100th percentile service time |                                      scroll | 319.724 |      ms |
|                    error rate |                                      scroll |       0 |       % |
|                Min Throughput |                         desc_sort_timestamp |   35.77 |   ops/s |
|             Median Throughput |                         desc_sort_timestamp |   35.77 |   ops/s |
|                Max Throughput |                         desc_sort_timestamp |   35.77 |   ops/s |
|      100th percentile latency |                         desc_sort_timestamp |  24.822 |      ms |
| 100th percentile service time |                         desc_sort_timestamp |  24.822 |      ms |
|                    error rate |                         desc_sort_timestamp |       0 |       % |
|                Min Throughput |                          asc_sort_timestamp |   40.51 |   ops/s |
|             Median Throughput |                          asc_sort_timestamp |   40.51 |   ops/s |
|                Max Throughput |                          asc_sort_timestamp |   40.51 |   ops/s |
|      100th percentile latency |                          asc_sort_timestamp | 25.0751 |      ms |
| 100th percentile service time |                          asc_sort_timestamp | 25.0751 |      ms |
|                    error rate |                          asc_sort_timestamp |       0 |       % |
|                Min Throughput | desc-sort-timestamp-after-force-merge-1-seg |   37.59 |   ops/s |
|             Median Throughput | desc-sort-timestamp-after-force-merge-1-seg |   37.59 |   ops/s |
|                Max Throughput | desc-sort-timestamp-after-force-merge-1-seg |   37.59 |   ops/s |
|      100th percentile latency | desc-sort-timestamp-after-force-merge-1-seg |  26.345 |      ms |
| 100th percentile service time | desc-sort-timestamp-after-force-merge-1-seg |  26.345 |      ms |
|                    error rate | desc-sort-timestamp-after-force-merge-1-seg |       0 |       % |
|                Min Throughput |  asc-sort-timestamp-after-force-merge-1-seg |   35.27 |   ops/s |
|             Median Throughput |  asc-sort-timestamp-after-force-merge-1-seg |   35.27 |   ops/s |
|                Max Throughput |  asc-sort-timestamp-after-force-merge-1-seg |   35.27 |   ops/s |
|      100th percentile latency |  asc-sort-timestamp-after-force-merge-1-seg | 31.2254 |      ms |
| 100th percentile service time |  asc-sort-timestamp-after-force-merge-1-seg | 31.2254 |      ms |
|                    error rate |  asc-sort-timestamp-after-force-merge-1-seg |       0 |       % |


--------------------------------
[INFO] SUCCESS (took 18 seconds)
--------------------------------

Hi,

did you enable --test-mode intentionally? This will only run for a very short period of time for a quick sanity check and the numbers that are reported are not trustworthy at all (see also our docs).

I suggest you remove that flag and run the complete benchmark which should produce more reasonable numbers. Please also review the parameters for this track and adjust them if needed. For some general benchmarking advice you can also check our blog post Seven Tips for Better Elasticsearch Benchmarks.

Daniel

1 Like

Oh man, thank you. Stupid mistake :sweat_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.