Understanding Metrics ( Cumulative indexing time of primary shards )

Hi ,
I have a corpus of 100k ( 1 lakh products ) in json , I am bulk indexing to one of the cluster with bulk_size of 10k .

GET product__v99/_stats

{
  "indexing": {
          "index_total": 100000,
          "index_time_in_millis": 46208,
          "index_current": 0,
          "index_failed": 0,
          "delete_total": 0,
          "delete_time_in_millis": 0,
          "delete_current": 0,
          "noop_update_total": 0,
          "is_throttled": false,
          "throttle_time_in_millis": 0
        },
    "bulk": {
          "total_operations": 10,
          "total_time_in_millis": 46489,
          "total_size_in_bytes": 224500000,
          "avg_time_in_millis": 3029,
          "avg_size_in_bytes": 14622169
        }
}

But when I check the csv from esrally it says Cumulative indexing time of primary shards = 38.31 mins .

Metric,Task,Value,Unit
Cumulative indexing time of primary shards,,38.31916666666667,min
Min cumulative indexing time across primary shards,,0,min
Median cumulative indexing time across primary shards,,0.00018333333333333334,min
Max cumulative indexing time across primary shards,,8.733716666666668,min
Cumulative indexing throttle time of primary shards,,0,min
Min cumulative indexing throttle time across primary shards,,0,min
Median cumulative indexing throttle time across primary shards,,0,min
Max cumulative indexing throttle time across primary shards,,0,min

I am really confused because there is a huge differnece in indexing time from report vs reality .

are only 2 threads in write pool , index is on 2 shards.

These stats use Elasticsearch's index stats API. Specifically, they use _stats/_all?level=shards and from there the index_time_in_millis stat.

For the sake of an example, I ran the geonames append-no-conflicts-index-only benchmark against a single ES 8.6.1 node; this generates one geonames index with 5 pri / 0 replica by default.

After the end of the benchmark I checked the stats API:

curl ... https://.../geonames/_stats/_all?pretty\&human\&level=shards | jq ._all.primaries.indexing
{
  "index_total": 11396503,
  "index_time": "12.8m",
  "index_time_in_millis": 773155,
...
}

whereas rally reported

| Cumulative indexing time of primary shards | | 12.8859 | min |

which matches what stats reported above (see index_time)

@dliappis Thanks for the reply , But this still doesn't help .

When I ran the same index/_stats api I am getting 46208 millisec ~ 46 seconds . From ES.
The same index was actually created and indexed using ES rally , and there i am seeing 38.31 mins .

Command I ran

for env in "dev"; do 
    for client_count in 1; do 
        cd /Users/jarvis/rally-tracks/es-benchmarking/reports
        mkdir $env
        cd $env
        mkdir $client
        for bulk_size in  10000 ; do
            # Your commands go here
            echo "Iteration $bulk_size client $client_count"
            esrally race --track-path=~/rally-tracks/es-benchmarking/esRallyTracks --target-hosts=<client_url> --client-options="use_ssl:true,verify_certs:true,basic_auth_user:'elastic',basic_auth_password:''" --kill-running-processes  --pipeline=benchmark-only --report-file=/Users/jarvis/rally-tracks/es-benchmarking/reports/$env/$client_count/$bulk_size.csv --report-format=csv --track-params=bulk_size:$bulk_size,clients:$client_count
            echo "Iteration $bulk_size client $client_count done"
            sleep 10 # Sleep for 5 minutes (300 seconds)
        done
    done
done

Attaching the files For ref :

{
    "version": 2,
    "description": "benchmarking for bulk product indexing",
    "indices": [
      {
        "name": "product__v99",
        "body": "index.json"
      }
    ],
    "corpora": [
      {
        "name": "rally-tutorial",
        "documents": [
          {
            "source-file": "products.json",
            "document-count": 100000,
            "uncompressed-bytes": 224599999
          }
        ]
      }
    ],
    "schedule": [
      {
        "operation": {
          "operation-type": "delete-index"
        }
      },
      {
        "operation": {
          "operation-type": "create-index"
        }
      },
      {
        "operation": {
          "operation-type": "cluster-health",
          "request-params": {
            "wait_for_status": "yellow"
          },
          "retry-until-success": true
        }
      },
      {
        "operation": {
          "operation-type": "bulk",
          "bulk-size": {{ bulk_size|default(5000) }}
        },
        "warmup-time-period": 0,
        "clients": {{ clients|default(1) }}
      },
      {
        "operation": {
          "operation-type": "force-merge"
        }
      }
    ]
  }

Hello,

As mentioned in my previous reply Rally will pull those metrics from all indices.

In your first example you have been looking at the stats of the specific index Rally is indexing to; however the stats that Rally reports for Cumulative indexing time for primary shards are for all indices as collected from _stats/_all?level=shard under ._all.primaries.indexing.index_time_in_millis.

If you are hitting a pre-existing cluster Rally will warn you about it when it starts:

<metric X> is <Y> ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.

In my previous example you can see that for a freshly started, dedicated cluster for benchmarking, the reported metrics are identical.

Please make sure that you are comparing the correct metrics and that your cluster is not dealing with other indexing workloads at the same time, while the benchmark is running.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.