What does "bulk operation" mean exactly in custom track on es-rally?

I set the bulk operation with 5000 bulk-size and 5 clients

{
  "version": 2,
  "description": "",
  "indices": [
    {
      "name": "indexing-1gb",
      "body": "index.json"
    }
  ],
  "corpora": [
    {
      "name": "log-data-1gb",
      "documents": [
        {
          "source-file": "documents.json",
          "document-count": 400000,
          "uncompressed-bytes": 298500000
        }
      ]
    }
  ],
  "challenges": [
    {
      "name": "bulk-indexing-1gb",
      "default": true,
      "schedule": [
        {
          "operation": {
            "operation-type": "delete-index"
          }
        },
        {
          "operation": {
            "operation-type": "create-index"
          }
        },
        {
          "operation": {
            "operation-type": "cluster-health",
            "request-params": {
              "wait_for_status": "green"
            },
            "retry-until-success": true
          }
        },
        ##### HERE ##### 
        {
          "operation": {
            "operation-type": "bulk",
            "bulk-size": 5000
          },
          "warmup-time-period": 120,
          "clients": 5
        },
        ############### 
        {
          "operation": {
            "operation-type": "force-merge"
          }
        }
      ]
    }
  ]
}

I thought the test would be like...

  1. 1 client send 5000 documents on one bulk API
  2. 1 bulk API is requested every one second (really?)
  3. there are 5 clients so 5 clinets send total 25000(5000*5) documents every one second
  4. eventually, elasticsearch is requested 25000 documents to index every second (???)

is it correct? If I set bulk operation, is the bulk api requested on every second?
also If there are multiple clinets on one operation, do every clients request some operation to elasticsearch on exact same time?

Hello @qksjdhi1212, thanks for your interest in Rally!

Note that the bulk operation is specified here: Track Reference - Rally 2.10.0 documentation. The main thing to understand is that we don't typically set target-throughput with bulk requests. We let indexing go unthrottled, and then measure how many docs were indexed per second.

To answer your questions:

  1. Yes, each client sends bulk requests of 5000 docs.
  2. No, each client waits for an answer from Elasticsearch before sending another request.
  3. No. Clients work concurrently, sending their bulk APIs without coordinating with the other clients because the file is split beforehand. You have 400000 docs, which is 80000 per client. Each client sends 5000 per bulk request, so each client will perform 16 requests in total. Knowing how much time they take is the point of the benchmark.
  4. No, as explained in 3.

A good rule of thumb to set the number of clients when bulk indexing is twice the total number of cores in your cluster. Say you have 3 nodes with 4 cores dedicated to Elasticsearch, then you should use 3 * 4 * 2 = 24 clients.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.