How to use the same operation multiple times inside the same schedule

Hi,

I want to perform create indexes with the same data-set with different bulk-sizes. So what I want to achieve is basically like:

  1. create an index with bulk-size =100
  2. Delete the index
  3. create the same index with bulk-size=500
  4. delete the index
  5. create the same index with bulk-size=1000
  6. delete the index and so on

However if i specify the delete operation multiple time inside the schedule for the challenge, rally says its duplicated. Is it not possible to use the same operation multiple times within a schedule?

"challenges": [
{
"name": "index-and-query",
"default": true,
"schedule": [
{
"operation": {
"operation-type": "delete-index"
}
}, {
"operation": {
"operation-type": "create-index"
} },
{
"operation": {
"operation-type": "cluster-health",
"request-params": {
"wait_for_status": "green"
}
} },
{
"operation": {
"operation-type": "bulk",
"bulk-size": 5000
},
"warmup-time-period": 120,
"clients": 8
},
{
"operation": {
"operation-type": "delete-index"
}
},
{
"operation": {
"operation-type": "create-index"
} },
{
"operation": {
"operation-type": "bulk",
"bulk-size": 10000
},
"warmup-time-period": 120,
"clients": 8
}]

Hi,

the reason for this error message is that Rally would mix the metrics for all three bulk operations. So you need to specify a name, e.g.:

{
  "name": "bulk-10000",
  "operation": {
    "operation-type": "bulk",
    "bulk-size": 10000
  },
  "warmup-time-period": 120,
  "clients": 8
}

However, I suggest something else because the way you have modelled it now, the system is not in an identical state in all three cases and thus your resulting benchmark may be prone to ordering bias.

You can instead define the bulk size as a track parameter and provide it externally:

{
  "challenges": [
    {
      "name": "index-and-query",
      "default": true,
      "schedule": [
        {
          "operation": {
            "operation-type": "delete-index"
          }
        },
        {
          "operation": {
            "operation-type": "create-index"
          }
        },
        {
          "operation": {
            "operation-type": "cluster-health",
            "request-params": {
              "wait_for_status": "green"
            }
          }
        },
        {
          "operation": {
            "operation-type": "bulk",
            "bulk-size": {{bulk_size | default(100)}}
          },
          "warmup-time-period": 120,
          "clients": 8
        }
      ]
    }
  ]
}

You can then change the bulk size to e.g. 500 when you invoke the benchmark with the command line argument --track-params="bulk_size:500". I suggest that you always specify the bulk size explicitly and don't rely on the default.

If you run esrally list races afterwards, Rally will show you the bulk size that you have specified on the command line:

Race Timestamp    Track     Track Parameters    Challenge                       Car       User Tags
----------------  --------  ------------------  ------------------------------  --------  -----------
20180202T094640Z  geonames  bulk_size=500       append-no-conflicts-index-only  defaults
20180202T094617Z  geonames  bulk_size=100       append-no-conflicts-index-only  defaults

Also, if you use a dedicated metrics store, each sample will store the track parameters that you have specified on the command line so you can easily tell them apart.

Finally, you have better reproducible conditions because you'll always correctly start / stop Elasticsearch before each benchmark.

Thanks Daniel

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.