Store size in esrally summary report doesn't match real index size

I'm using custom track and benchmarking on existing cluster(single node)

This is my summary report for custom track

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

[INFO] Race id is [69db9260-b476-483a-9e35-09975b8c00db]
[INFO] Racing on track [indexing-6gb], challenge [bulk-indexing-1gb] and car ['external'] with version [8.5.2].

Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running cluster-health                                                         [100% done]
Running bulk                                                                   [100% done]
Running force-merge                                                            [100% done]

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|                                                         Metric |   Task |           Value |   Unit |
|---------------------------------------------------------------:|-------:|----------------:|-------:|
|                     Cumulative indexing time of primary shards |        |    13.2576      |    min |
|             Min cumulative indexing time across primary shards |        |     0           |    min |
|          Median cumulative indexing time across primary shards |        |     1.66667e-05 |    min |
|             Max cumulative indexing time across primary shards |        |    13.2343      |    min |
|            Cumulative indexing throttle time of primary shards |        |     0           |    min |
|    Min cumulative indexing throttle time across primary shards |        |     0           |    min |
| Median cumulative indexing throttle time across primary shards |        |     0           |    min |
|    Max cumulative indexing throttle time across primary shards |        |     0           |    min |
|                        Cumulative merge time of primary shards |        |     1.59255     |    min |
|                       Cumulative merge count of primary shards |        |    15           |        |
|                Min cumulative merge time across primary shards |        |     0           |    min |
|             Median cumulative merge time across primary shards |        |     0           |    min |
|                Max cumulative merge time across primary shards |        |     1.59255     |    min |
|               Cumulative merge throttle time of primary shards |        |     0.204433    |    min |
|       Min cumulative merge throttle time across primary shards |        |     0           |    min |
|    Median cumulative merge throttle time across primary shards |        |     0           |    min |
|       Max cumulative merge throttle time across primary shards |        |     0.204433    |    min |
|                      Cumulative refresh time of primary shards |        |     0.0978667   |    min |
|                     Cumulative refresh count of primary shards |        |    57           |        |
|              Min cumulative refresh time across primary shards |        |     0           |    min |
|           Median cumulative refresh time across primary shards |        |     0           |    min |
|              Max cumulative refresh time across primary shards |        |     0.0942833   |    min |
|                        Cumulative flush time of primary shards |        |     0.459433    |    min |
|                       Cumulative flush count of primary shards |        |    17           |        |
|                Min cumulative flush time across primary shards |        |     0           |    min |
|             Median cumulative flush time across primary shards |        |     0           |    min |
|                Max cumulative flush time across primary shards |        |     0.459433    |    min |
|                                        Total Young Gen GC time |        |     0.783       |      s |
|                                       Total Young Gen GC count |        |    42           |        |
|                                          Total Old Gen GC time |        |     0           |      s |
|                                         Total Old Gen GC count |        |     0           |        |
|                                                     Store size |        |     1.01798     |     GB |
|                                                  Translog size |        |     0.00573797  |     GB |
|                                         Heap used for segments |        |     0           |     MB |
|                                       Heap used for doc values |        |     0           |     MB |
|                                            Heap used for terms |        |     0           |     MB |
|                                            Heap used for norms |        |     0           |     MB |
|                                           Heap used for points |        |     0           |     MB |
|                                    Heap used for stored fields |        |     0           |     MB |
|                                                  Segment count |        |    98           |        |
|                                    Total Ingest Pipeline count |        |     0           |        |
|                                     Total Ingest Pipeline time |        |     0           |      s |
|                                   Total Ingest Pipeline failed |        |     0           |        |
|                                                 Min Throughput |   bulk |  1868.74        | docs/s |
|                                                Mean Throughput |   bulk | 30598.8         | docs/s |
|                                              Median Throughput |   bulk | 33351.4         | docs/s |
|                                                 Max Throughput |   bulk | 35072.8         | docs/s |
|                                        50th percentile latency |   bulk |   719.27        |     ms |
|                                        90th percentile latency |   bulk |  1144.77        |     ms |
|                                        99th percentile latency |   bulk |  2510.96        |     ms |
|                                      99.9th percentile latency |   bulk |  3517.92        |     ms |
|                                       100th percentile latency |   bulk |  4086.26        |     ms |
|                                   50th percentile service time |   bulk |   719.27        |     ms |
|                                   90th percentile service time |   bulk |  1144.77        |     ms |
|                                   99th percentile service time |   bulk |  2510.96        |     ms |
|                                 99.9th percentile service time |   bulk |  3517.92        |     ms |
|                                  100th percentile service time |   bulk |  4086.26        |     ms |
|                                                     error rate |   bulk |     0           |      % |

it says store size is 1.01798 gb but in my kibana view and index api says different value

I got this info using _cat/shards?v=true&h=index,prirep,shard,store&s=prirep,store&bytes=mb plz check indexing-six-gb

my custom track

{
  "version": 2,
  "description": "",
  "indices": [
    {
      "name": "indexing-six-gb",
      "body": "index.json"
    }
  ],
  "corpora": [
    {
      "name": "log-data-6gb",
      "documents": [
        {
          "source-file": "documents.json",
          "document-count": 8000000,
          "uncompressed-bytes": 6066000000
        }
      ]
    }
  ],
  "challenges": [
    {
      "name": "bulk-indexing-1gb",
      "default": true,
      "schedule": [
        {
          "operation": {
            "operation-type": "delete-index"
          }
        },
        {
          "operation": {
            "operation-type": "create-index"
          }
        },
        {
          "operation": {
            "operation-type": "cluster-health",
            "request-params": {
              "wait_for_status": "green"
            },
            "index": "indexing-six-gb",
            "retry-until-success": true
          }
        },
        {
          "operation": {
            "operation-type": "bulk",
            "bulk-size": 5000
          },
          "warmup-time-period": 0,
          "clients": 6
        },
        {
          "operation": {
            "operation-type": "force-merge"
          }
        }
      ]
    }
  ]
}

sometimes store size equals to real index size but sometimes it doesn't

it seems really weird because store size value in report seems to be changed depending on value of indices.name and corpora.name field

when specific name of index and corpora assigned, store size has specific value (ex) when index name is indexing-1gb and corpora name is log-data-1gb, store size seems to be 0.04gb which is real size of index)

single property of test result turns out to be wrong so now I cannot trust the whole result of race it's really frustrating how can I fix this?

Hey there :wave:! Firstly, thanks for interest and usage of Rally.

it seems really weird because store size value in report seems to be changed depending on value of indices.name and corpora.name field

when specific name of index and corpora assigned, store size has specific value (ex) when index name is indexing-1gb and corpora name is log-data-1gb, store size seems to be 0.04gb which is real size of index)

single property of test result turns out to be wrong so now I cannot trust the whole result of race it's really frustrating how can I fix this?

I'm sorry, but I don't think I understand your issue entirely. I used your track example (with my own sample data) to try and reproduce the issue and wasn't able to.

Based on what you've stated I can confirm that the naming of the index and corpora shouldn't change the resulting Store size metric Rally reports at benchmark end.

So, I think there may be a slightly confusion here, and that's because the Store size metric reported by Rally is inclusive of all indices within the target cluster (_all), not just the index targeted by track.

You can see the source code here where we parse the index stats for all indices:

self.add_metrics(self.extract_value(index_stats, ["_all", "total", "store", "size_in_bytes"]), "store_size_in_bytes", "byte")

Is it possible that you have some other indices within the cluster (maybe they're also hidden indices?) that could be affecting the end result?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.