Rally does not have control over the configuration of the benchmarked Elasticsearch cluster

Just connfigured our first successful race but confused why we are getting this message. Note we are pointing to a ES Cloud cluster.

ffoti@rally:~$ ~/.local/bin/esrally --pipeline=benchmark-only --track-path /mnt1/benchmarch/tracks/private/ --target-hosts=https://************.us-west-2.aws.found.io:9243 --client-options="use_ssl:false,basic_auth_user:'frank',basic_auth_password:'******'"  --challenge=just-search

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

[INFO] You did not provide an explicit timeout in the client options. Assuming default of 10 seconds.

************************************************************************
************** WARNING: A dark dungeon lies ahead of you  **************
************************************************************************

Rally does not have control over the configuration of the benchmarked
Elasticsearch cluster.

Be aware that results may be misleading due to problems with the setup.
Rally is also not able to gather lots of metrics at all (like CPU usage
of the benchmarked cluster) or may even produce misleading metrics (like
the index size).

************************************************************************
****** Use this pipeline only if you are aware of the tradeoffs.  ******
*************************** Watch your step! ***************************
************************************************************************

[INFO] Racing on track [private], challenge [just-search] and car ['external'] with version [7.5.1].

[WARNING] merges_total_time is 6734435 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] merges_total_throttled_time is 417635 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] indexing_total_time is 6422054 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] refresh_total_time is 6990313 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] flush_total_time is 185787 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
Running search                                                                 [100% done]

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|                                                         Metric |   Task |    Value |   Unit |
|---------------------------------------------------------------:|-------:|---------:|-------:|
|                     Cumulative indexing time of primary shards |        |  107.044 |    min |
|             Min cumulative indexing time across primary shards |        |        0 |    min |
|          Median cumulative indexing time across primary shards |        |        0 |    min |
|             Max cumulative indexing time across primary shards |        |  32.1033 |    min |
|            Cumulative indexing throttle time of primary shards |        |        0 |    min |
|    Min cumulative indexing throttle time across primary shards |        |        0 |    min |
| Median cumulative indexing throttle time across primary shards |        |        0 |    min |
|    Max cumulative indexing throttle time across primary shards |        |        0 |    min |
|                        Cumulative merge time of primary shards |        |  112.241 |    min |
|                       Cumulative merge count of primary shards |        |    37229 |        |
|                Min cumulative merge time across primary shards |        |        0 |    min |
|             Median cumulative merge time across primary shards |        |        0 |    min |
|                Max cumulative merge time across primary shards |        |  35.0529 |    min |
|               Cumulative merge throttle time of primary shards |        |  6.96058 |    min |
|       Min cumulative merge throttle time across primary shards |        |        0 |    min |
|    Median cumulative merge throttle time across primary shards |        |        0 |    min |
|       Max cumulative merge throttle time across primary shards |        |  3.55435 |    min |
|                      Cumulative refresh time of primary shards |        |  116.512 |    min |
|                     Cumulative refresh count of primary shards |        |   357633 |        |
|              Min cumulative refresh time across primary shards |        |        0 |    min |
|           Median cumulative refresh time across primary shards |        |        0 |    min |
|              Max cumulative refresh time across primary shards |        |  31.2121 |    min |
|                        Cumulative flush time of primary shards |        |  3.09645 |    min |
|                       Cumulative flush count of primary shards |        |     7828 |        |
|                Min cumulative flush time across primary shards |        |        0 |    min |
|             Median cumulative flush time across primary shards |        |        0 |    min |
|                Max cumulative flush time across primary shards |        |  1.22675 |    min |
|                                             Total Young Gen GC |        |    0.678 |      s |
|                                               Total Old Gen GC |        |    0.075 |      s |
|                                                     Store size |        |  27.6191 |     GB |
|                                                  Translog size |        | 0.858794 |     GB |
|                                         Heap used for segments |        |  27.3574 |     MB |
|                                       Heap used for doc values |        |  3.59054 |     MB |
|                                            Heap used for terms |        |  15.0366 |     MB |
|                                            Heap used for norms |        |  1.32898 |     MB |
|                                           Heap used for points |        |  4.32471 |     MB |
|                                    Heap used for stored fields |        |  3.07653 |     MB |
|                                                  Segment count |        |     1828 |        |
|                                                 Min Throughput | search |     9.95 |  ops/s |
|                                              Median Throughput | search |    10.01 |  ops/s |
|                                                 Max Throughput | search |    10.01 |  ops/s |
|                                        50th percentile latency | search |  87.8943 |     ms |
|                                        90th percentile latency | search |  145.228 |     ms |
|                                        99th percentile latency | search |  178.274 |     ms |
|                                       100th percentile latency | search |  185.229 |     ms |
|                                   50th percentile service time | search |  87.2966 |     ms |
|                                   90th percentile service time | search |  94.9166 |     ms |
|                                   99th percentile service time | search |  169.021 |     ms |
|                                  100th percentile service time | search |  171.141 |     ms |
|                                                     error rate | search |        0 |      % |


--------------------------------
[INFO] SUCCESS (took 41 seconds)
--------------------------------

Hi Frank,
This is the trade-off of the "benchmark-only" pipeline: https://esrally.readthedocs.io/en/stable/pipelines.html#benchmark-only
As Rally has not configured and started the Elasticsearch cluster itself, results are not comparable to results from benchmarks where Rally was in control from start to finish, and may not be repeatable.

Hey Dennis,

Totally understand better now. Our goal is to benchmark internal tests for loading Elasticsearch and query that load to compare between Dev Test and Prod. Those ES environments will be controlled by us so we will be aware of any differences and monitoring them directly as well. So…. Benchmark-only there is valid correct?
We hope to gain at a minimum two things:

  1. How does the same test perform against two differently configured ES deployments.
  2. How does the same test perform against the same ES deployment but changes to the process such as different DSL Query.
    What are some of the key metrics we should be looking at from Rally output? We are planning on building some dashboards in Kibana to display Rally data aligned with our metric data to form the “observability”. Are there any pre-built dashboards?
    Let me know any thoughts, links or direction.

Thanks for your response.

Correct, benchmark-only is what you're going to need.
If you are careful, Rally can provide answers to those benchmarking questions and plenty more.

For key numbers, that mostly depends on each use-case. We maintain a performance dashboard with Kibana charts of nightly Elasticsearch builds with rally-tracks, configured by our Chart Generator.
Perhaps these could provide some inspiration?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.