How to increase the number of shards or replicas dynamically in single race on esrally

qksjdhi1212 · November 24, 2022, 5:25pm

I planned to set cluster and increase the number of shards or replicas to optimize it testing the cluster with esrally

Is there any way to start one shard and increase to n shards on single esrally race

+) I found following graph here (Benchmarking and sizing your Elasticsearch cluster for logs and metrics | Elastic Blog)

is this conducted by increasing the number of nodes one by one during single esrally race? or by testing and summarize tests (conduct one-node test, two-node test and 3-node test separately and put one graph)

Bradley_Deam · November 28, 2022, 12:31am

Howdy, thanks for your interest in Rally!

Is there any way to start one shard and increase to n shards on single esrally race

You could definitely build a custom track that does this by specifying a series of Operations that creates your target indices/datastreams with incrementing shard counts (see Creating a track from scratch).

However, I'd suggest that it's probably easier to define a single benchmark that instead makes uses of Track Parameters to set the number of primary replica shards. That way you can get seperate 'Race' results for each shard count, e.g.:

$ esrally race [...] --track-params="number_of_shards:1,number_of_replicas:1"
    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

[INFO] Race id is [adaa859b-98e0-4267-ad1c-a9b0983071e8]
[...]

$ esrally race [...] --track-params="number_of_shards:2,number_of_replicas:1"
    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

[INFO] Race id is [fb387eee-729e-4895-a57b-f5a2fb96b6f1]
[...]

The added benefit of having seperate Races is that you can quickly compare their results via the compare subcommand:

$ esrally compare --baseline  adaa859b-98e0-4267-ad1c-a9b0983071e8 --contender fb387eee-729e-4895-a57b-f5a2fb96b6f1

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/


Comparing baseline
  Race ID: adaa859b-98e0-4267-ad1c-a9b0983071e8 
  Race timestamp: 2022-11-23 02:17:25
  Car: external

with contender
  Race ID: fb387eee-729e-4895-a57b-f5a2fb96b6f1
  Race timestamp: 2022-11-23 02:17:14
  Car: external

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------
            
|                                                        Metric |   Task |        Baseline |      Contender |          Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|-------:|----------------:|---------------:|--------------:|-------:|---------:|
|                    Cumulative indexing time of primary shards |        |      0.1413     |    0.0124833   |      -0.12882 |    min |  -91.17% |
|             Min cumulative indexing time across primary shard |        |      0.02305    |    0.000383333 |      -0.02267 |    min |  -98.34% |
|          Median cumulative indexing time across primary shard |        |      0.0243667  |    0.00188333  |      -0.02248 |    min |  -92.27% |
|             Max cumulative indexing time across primary shard |        |      0.0455     |    0.00758333  |      -0.03792 |    min |  -83.33% |
|           Cumulative indexing throttle time of primary shards |        |      0          |    0           |       0       |    min |    0.00% |
|    Min cumulative indexing throttle time across primary shard |        |      0          |    0           |       0       |    min |    0.00% |
| Median cumulative indexing throttle time across primary shard |        |      0          |    0           |       0       |    min |    0.00% |
|    Max cumulative indexing throttle time across primary shard |        |      0          |    0           |       0       |    min |    0.00% |
|                       Cumulative merge time of primary shards |        |      0          |    0           |       0       |    min |    0.00% |
|                      Cumulative merge count of primary shards |        |      0          |    0           |       0       |        |    0.00% |
|                Min cumulative merge time across primary shard |        |      0          |    0           |       0       |    min |    0.00% |
|             Median cumulative merge time across primary shard |        |      0          |    0           |       0       |    min |    0.00% |
|                Max cumulative merge time across primary shard |        |      0          |    0           |       0       |    min |    0.00% |
|              Cumulative merge throttle time of primary shards |        |      0          |    0           |       0       |    min |    0.00% |
|       Min cumulative merge throttle time across primary shard |        |      0          |    0           |       0       |    min |    0.00% |
|    Median cumulative merge throttle time across primary shard |        |      0          |    0           |       0       |    min |    0.00% |
|       Max cumulative merge throttle time across primary shard |        |      0          |    0           |       0       |    min |    0.00% |
|                     Cumulative refresh time of primary shards |        |      0.0325833  |    0.00103333  |      -0.03155 |    min |  -96.83% |
|                    Cumulative refresh count of primary shards |        |     16          |   13           |      -3       |        |  -18.75% |
|              Min cumulative refresh time across primary shard |        |      0.00126667 |    0           |      -0.00127 |    min | -100.00% |
|           Median cumulative refresh time across primary shard |        |      0.00128333 |    0.000216667 |      -0.00107 |    min |  -83.12% |
|              Max cumulative refresh time across primary shard |        |      0.02745    |    0.000466667 |      -0.02698 |    min |  -98.30% |
|                       Cumulative flush time of primary shards |        |      0          |    0           |       0       |    min |    0.00% |
|                      Cumulative flush count of primary shards |        |      0          |    0           |       0       |        |    0.00% |
|                Min cumulative flush time across primary shard |        |      0          |    0           |       0       |    min |    0.00% |
|             Median cumulative flush time across primary shard |        |      0          |    0           |       0       |    min |    0.00% |
|                Max cumulative flush time across primary shard |        |      0          |    0           |       0       |    min |    0.00% |
|                                       Total Young Gen GC time |        |      0.021      |    0           |      -0.021   |      s | -100.00% |
|                                      Total Young Gen GC count |        |      1          |    0           |      -1       |        | -100.00% |
|                                         Total Old Gen GC time |        |      0          |    0           |       0       |      s |    0.00% |
|                                        Total Old Gen GC count |        |      0          |    0           |       0       |        |    0.00% |
|                                                    Store size |        |      0.0131358  |    0.000311377 |      -0.01282 |     GB |  -97.63% |
|                                                 Translog size |        |      0.0136973  |    0.000307606 |      -0.01339 |     GB |  -97.75% |
|                                        Heap used for segments |        |      0          |    0           |       0       |     MB |    0.00% |
|                                      Heap used for doc values |        |      0          |    0           |       0       |     MB |    0.00% |
|                                           Heap used for terms |        |      0          |    0           |       0       |     MB |    0.00% |
|                                           Heap used for norms |        |      0          |    0           |       0       |     MB |    0.00% |
|                                          Heap used for points |        |      0          |    0           |       0       |     MB |    0.00% |
|                                   Heap used for stored fields |        |      0          |    0           |       0       |     MB |    0.00% |
|                                                 Segment count |        |     12          |    4           |      -8       |        |  -66.67% |
|                                   Total Ingest Pipeline count |        |      0          |    0           |       0       |        |    0.00% |
|                                    Total Ingest Pipeline time |        |      0          |    0           |       0       |     ms |    0.00% |
|                                  Total Ingest Pipeline failed |        |      0          |    0           |       0       |        |    0.00% |
|                                                Min Throughput |   bulk | 136251          | 2298.03        | -133953       | docs/s |  -98.31% |
|                                               Mean Throughput |   bulk | 136251          | 2298.03        | -133953       | docs/s |  -98.31% |
|                                             Median Throughput |   bulk | 136251          | 2298.03        | -133953       | docs/s |  -98.31% |
|                                                Max Throughput |   bulk | 136251          | 2298.03        | -133953       | docs/s |  -98.31% |
|                                       50th percentile latency |   bulk |   1130.42       |  276.353       |    -854.064   |     ms |  -75.55% |
|                                      100th percentile latency |   bulk |   1752.18       |  381.124       |   -1371.06    |     ms |  -78.25% |
|                                  50th percentile service time |   bulk |   1130.42       |  276.353       |    -854.064   |     ms |  -75.55% |
|                                 100th percentile service time |   bulk |   1752.18       |  381.124       |   -1371.06    |     ms |  -78.25% |
|                                                    error rate |   bulk |      0          |    0           |       0       |      % |    0.00% |


-------------------------------
[INFO] SUCCESS (took 0 seconds)
-------------------------------

+) I found following graph here (Benchmarking and sizing your Elasticsearch cluster for logs and metrics | Elastic Blog)
is this conducted by increasing the number of nodes one by one during single esrally race? or by testing and summarize tests (conduct one-node test, two-node test and 3-node test separately and put one graph)

The chart is showing you results on the Y-Axis and each race on the X-Axis, i.e. these are seperate races. You can see more about using an External Metrics Store here.

Lastly, I recommend watching this talk titlted: The Seven Deadly Sins of Elasticsearch Benchmarking, it does a great job of covering common pitfalls and gotchas when it comes to properly analysing benchmark results.

system · December 26, 2022, 12:32am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
On scaling Elasticsearch	10	882	July 6, 2017
Using ES in a dynamic EC2 environment Elasticsearch	7	414	July 6, 2017
Considering scalability , is it right to keep a large number of primary shards at beginning？ Elasticsearch	5	424	July 6, 2017
Benchmarking ES cluster using larger Rally dataset for multiple parallel indexing Elasticsearch rally	5	1122	July 5, 2019
Increasing shards and then nodes Elasticsearch	12	926	July 6, 2017

How to increase the number of shards or replicas dynamically in single race on esrally

Related topics