Run Rally races against an ES cluster built on OpenShift

benelastic · November 27, 2023, 9:18am

Here is how I do benchmarking with my existing ES cluster:

I have an existing ES cluster built on OpenShift environment
The cluster has 5 nodes and each is having exactly same resources
The ES cluster is being exposed with single endpoint, so when visiting from outside the cluster works like a single instance because OpenShift will handle the load balancing.
I created my custom track with my own index data
I ran the race with docker command, like: docker run --rm -v ${pwd}/esrally/.rally:/rally/.rally -v ${pwd}/esrally/reports:/rally/reports elastic/rally:2.10.0 race --pipeline=benchmark-only --target-host=remote.host.com:80 --track=my-custom-track --challenge=default --report-file=/rally/reports/my-custom-track-rally-report.md --report-format=markdown --on-error=abort --offline

But When I tried to run a few rounds of races (with the same version of ES) and compared the result, I found the differences can be very big. For example:

|                                                        Metric |                 Task |         Baseline |     Contender |          Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|---------------------:|-----------------:|--------------:|--------------:|-------:|---------:|
|                    Cumulative indexing time of primary shards |                      |     63.0863      |     28.0918   |     -34.9945  |    min |  -55.47% |
|             Min cumulative indexing time across primary shard |                      |      0           |      0        |       0       |    min |    0.00% |
|          Median cumulative indexing time across primary shard |                      |      0.000741667 |      0        |      -0.00074 |    min | -100.00% |
|             Max cumulative indexing time across primary shard |                      |      3.20992     |      3.29123  |       0.08132 |    min |   +2.53% |
|           Cumulative indexing throttle time of primary shards |                      |      0           |      0        |       0       |    min |    0.00% |
|    Min cumulative indexing throttle time across primary shard |                      |      0           |      0        |       0       |    min |    0.00% |
| Median cumulative indexing throttle time across primary shard |                      |      0           |      0        |       0       |    min |    0.00% |
|    Max cumulative indexing throttle time across primary shard |                      |      0           |      0        |       0       |    min |    0.00% |
|                       Cumulative merge time of primary shards |                      |    123.878       |      1.08262  |    -122.796   |    min |  -99.13% |
|                      Cumulative merge count of primary shards |                      |  24024           |     22        |  -24002       |        |  -99.91% |
|                Min cumulative merge time across primary shard |                      |      0           |      0        |       0       |    min |    0.00% |
|             Median cumulative merge time across primary shard |                      |      0.00045     |      0        |      -0.00045 |    min | -100.00% |
|                Max cumulative merge time across primary shard |                      |     17.5161      |      0.566267 |     -16.9498  |    min |  -96.77% |
|              Cumulative merge throttle time of primary shards |                      |      0.259333    |      0.1108   |      -0.14853 |    min |  -57.28% |
|       Min cumulative merge throttle time across primary shard |                      |      0           |      0        |       0       |    min |    0.00% |
|    Median cumulative merge throttle time across primary shard |                      |      0           |      0        |       0       |    min |    0.00% |
|       Max cumulative merge throttle time across primary shard |                      |      0.134967    |      0.110767 |      -0.0242  |    min |  -17.93% |
|                     Cumulative refresh time of primary shards |                      |     48.9657      |      3.7339   |     -45.2318  |    min |  -92.37% |
|                    Cumulative refresh count of primary shards |                      | 224377           |    984        | -223393       |        |  -99.56% |
|              Min cumulative refresh time across primary shard |                      |      0           |      0        |       0       |    min |    0.00% |
|           Median cumulative refresh time across primary shard |                      |      0.003475    |      0        |      -0.00347 |    min | -100.00% |
|              Max cumulative refresh time across primary shard |                      |      5.46093     |      0.523717 |      -4.93722 |    min |  -90.41% |
|                       Cumulative flush time of primary shards |                      |      6.23043     |      2.11223  |      -4.1182  |    min |  -66.10% |
|                      Cumulative flush count of primary shards |                      |  14090           |    132        |  -13958       |        |  -99.06% |
|                Min cumulative flush time across primary shard |                      |      0           |      0        |       0       |    min |    0.00% |
|             Median cumulative flush time across primary shard |                      |      0.00205     |      0        |      -0.00205 |    min | -100.00% |
|                Max cumulative flush time across primary shard |                      |      0.50635     |      0.337367 |      -0.16898 |    min |  -33.37% |
|                                       Total Young Gen GC time |                      |     69.005       |     71.058    |       2.053   |      s |   +2.98% |
|                                      Total Young Gen GC count |                      |   5992           |   5938        |     -54       |        |   -0.90% |
|                                         Total Old Gen GC time |                      |      1.064       |      0.816    |      -0.248   |      s |  -23.31% |
|                                        Total Old Gen GC count |                      |     15           |     10        |      -5       |        |  -33.33% |
|                                                    Store size |                      |     11.527       |     11.6336   |       0.10655 |     GB |   +0.92% |
|                                                 Translog size |                      |      0.6923      |      0.11595  |      -0.57635 |     GB |  -83.25% |
|                                        Heap used for segments |                      |      3.9797      |      3.93391  |      -0.04579 |     MB |   -1.15% |
|                                      Heap used for doc values |                      |      0.778618    |      0.690968 |      -0.08765 |     MB |  -11.26% |
|                                           Heap used for terms |                      |      2.65974     |      2.70258  |       0.04285 |     MB |   +1.61% |
|                                           Heap used for norms |                      |      0.338989    |      0.346619 |       0.00763 |     MB |   +2.25% |
|                                          Heap used for points |                      |      0           |      0        |       0       |     MB |    0.00% |
|                                   Heap used for stored fields |                      |      0.202354    |      0.193741 |      -0.00861 |     MB |   -4.26% |
|                                                 Segment count |                      |    421           |    402        |     -19       |        |   -4.51% |
|                                   Total Ingest Pipeline count |                      |      0           |      0        |       0       |        |    0.00% |
|                                    Total Ingest Pipeline time |                      |      0           |      0        |       0       |     ms |    0.00% |
|                                  Total Ingest Pipeline failed |                      |      0           |      0        |       0       |        |    0.00% |

Questions:

Why some of the result data are so different between each race? Even each race are ran on exactly same version of ES, same data set, same track.
How to make sure the report is reflecting the real performance of the ES cluster?
What are the recommended steps to benchmarking a remote ES cluster built on OpenShift?

Thanks.

system · December 25, 2023, 9:19am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can elastic/rally point to existing ES configurations for benchmarking Elasticsearch rally	10	3720	January 10, 2017
Running benchmarks in a docker environment Elasticsearch rally	2	837	November 14, 2018
Running benchmark-only tournaments Elasticsearch rally	10	1657	February 7, 2017
Benchmark for existing cluster Elasticsearch rally	4	3807	August 15, 2017
Benchmarking cluster with rally Elasticsearch rally	3	1210	August 23, 2021

Run Rally races against an ES cluster built on OpenShift

Related topics