Rally custom benchmark-only pipeline / track without corpora

Hi there,
i have an already provisioned elasticsearch setup with an Index that consist of around 30Million documents. I would like to benchmark this existing setup using esrally. In order to do so esrally provides the benchmark-only pipeline. But i would like to use my own custom searches with term aggregations, significant_term aggregation and so on to benchmark my setup. How can this be achieved? Is it possible to customize the benchmark-only pipeline? Or do i have to setup a custom track? If a custom track is needed - the docs say that the corpora element is mandatory. But i don't need a custom corpora, because i do not want to index any documents?

Many thanks in advance,
Sebastian

Benchmark-only is a mode, not a track, so you will need to create a custom track which describes the operations and workload. Unfortunately I think a corpora is still mandatory, even though I would expect running custom queries against existing data to be a quite common use case that does not require a corpora. Rally originated as a tool for regression performance testing and that generally always involved benchmarking (and often also setting up and provisioning) against an empty cluster. For this use case it makes sense to make a data set mandatory.

Now that Rally is used for a wider range of use-cases I think it would make sense to relax these constraints and only require a corpora if there are operations that rely on one. I would recommend creating an enhancement requuest.

Corpora and index definitions are not required with pipeline=benchmark-only.

For example this ad-hoc track:

{
  "version": 2,
  "description": "Example query only benchmark",
  "challenges": [
    {
      "name": "query-only",
      "default": true,
      "schedule": [
        {
          "operation": {
            "operation-type": "cluster-health",
            "request-params": {
              "wait_for_status": "green"
            }
          }
        },
        {
          "operation": {
            "name": "query-match-all",
            "operation-type": "search",
            "index": "logs",
            "body": {
              "query": {
                "match_all": {}
              }
            }
          },
          "clients": 8,
          "warmup-iterations": 1000,
          "iterations": 1000,
          "target-throughput": 100
        }
      ]
    }
  ]
}

works fine with a command like:

esrally --pipeline=benchmark-only --track-path=$PWD --target-hosts=127.0.0.1:39200 --on-error=abort

You can also see such an example in the docs here: https://esrally.readthedocs.io/en/stable/track.html#a-track-with-a-single-task

I'll create an issue to document this better in the track anatony and/or the corpora and index reference definitions.

2 Likes

Many thanks Dimitrios for pointing this out :slight_smile: This is exactly what is missing in the docs - that's why i raised the question here.

You are welcome. I opened https://github.com/elastic/rally/issues/990.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.