How to use the same operation multiple times inside the same schedule

dipanjan44 · February 2, 2018, 1:24am

Hi,

I want to perform create indexes with the same data-set with different bulk-sizes. So what I want to achieve is basically like:

create an index with bulk-size =100
Delete the index
create the same index with bulk-size=500
delete the index
create the same index with bulk-size=1000
delete the index and so on

However if i specify the delete operation multiple time inside the schedule for the challenge, rally says its duplicated. Is it not possible to use the same operation multiple times within a schedule?

"challenges": [
{
"name": "index-and-query",
"default": true,
"schedule": [
{
"operation": {
"operation-type": "delete-index"
}
}, {
"operation": {
"operation-type": "create-index"
} },
{
"operation": {
"operation-type": "cluster-health",
"request-params": {
"wait_for_status": "green"
}
} },
{
"operation": {
"operation-type": "bulk",
"bulk-size": 5000
},
"warmup-time-period": 120,
"clients": 8
},
{
"operation": {
"operation-type": "delete-index"
}
},
{
"operation": {
"operation-type": "create-index"
} },
{
"operation": {
"operation-type": "bulk",
"bulk-size": 10000
},
"warmup-time-period": 120,
"clients": 8
}]

danielmitterdorfer · February 2, 2018, 9:48am

Hi,

the reason for this error message is that Rally would mix the metrics for all three bulk operations. So you need to specify a name, e.g.:

{
  "name": "bulk-10000",
  "operation": {
    "operation-type": "bulk",
    "bulk-size": 10000
  },
  "warmup-time-period": 120,
  "clients": 8
}

However, I suggest something else because the way you have modelled it now, the system is not in an identical state in all three cases and thus your resulting benchmark may be prone to ordering bias.

You can instead define the bulk size as a track parameter and provide it externally:

{
  "challenges": [
    {
      "name": "index-and-query",
      "default": true,
      "schedule": [
        {
          "operation": {
            "operation-type": "delete-index"
          }
        },
        {
          "operation": {
            "operation-type": "create-index"
          }
        },
        {
          "operation": {
            "operation-type": "cluster-health",
            "request-params": {
              "wait_for_status": "green"
            }
          }
        },
        {
          "operation": {
            "operation-type": "bulk",
            "bulk-size": {{bulk_size | default(100)}}
          },
          "warmup-time-period": 120,
          "clients": 8
        }
      ]
    }
  ]
}

You can then change the bulk size to e.g. 500 when you invoke the benchmark with the command line argument --track-params="bulk_size:500". I suggest that you always specify the bulk size explicitly and don't rely on the default.

If you run esrally list races afterwards, Rally will show you the bulk size that you have specified on the command line:

Race Timestamp    Track     Track Parameters    Challenge                       Car       User Tags
----------------  --------  ------------------  ------------------------------  --------  -----------
20180202T094640Z  geonames  bulk_size=500       append-no-conflicts-index-only  defaults
20180202T094617Z  geonames  bulk_size=100       append-no-conflicts-index-only  defaults

Also, if you use a dedicated metrics store, each sample will store the track parameters that you have specified on the command line so you can easily tell them apart.

Finally, you have better reproducible conditions because you'll always correctly start / stop Elasticsearch before each benchmark.

dipanjan44 · February 2, 2018, 5:26pm

Thanks Daniel

system · March 2, 2018, 5:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What does "bulk operation" mean exactly in custom track on es-rally? Elasticsearch rally	2	231	April 10, 2024
Operation type bulk Elasticsearch rally	2	572	April 30, 2018
Passing variables form changes to operations Elasticsearch rally	1	422	March 16, 2022
Parallel Bulk from multiple source files Elasticsearch rally	2	696	September 17, 2019
Indexing single document Elasticsearch rally	4	461	August 24, 2020

How to use the same operation multiple times inside the same schedule

Related topics