Rally custom benchmark-only pipeline / track without corpora

crab86 · May 10, 2020, 9:08am

Hi there,
i have an already provisioned elasticsearch setup with an Index that consist of around 30Million documents. I would like to benchmark this existing setup using esrally. In order to do so esrally provides the benchmark-only pipeline. But i would like to use my own custom searches with term aggregations, significant_term aggregation and so on to benchmark my setup. How can this be achieved? Is it possible to customize the benchmark-only pipeline? Or do i have to setup a custom track? If a custom track is needed - the docs say that the corpora element is mandatory. But i don't need a custom corpora, because i do not want to index any documents?

Many thanks in advance,
Sebastian

Christian_Dahlqvist · May 10, 2020, 12:25pm

Benchmark-only is a mode, not a track, so you will need to create a custom track which describes the operations and workload. Unfortunately I think a corpora is still mandatory, even though I would expect running custom queries against existing data to be a quite common use case that does not require a corpora. Rally originated as a tool for regression performance testing and that generally always involved benchmarking (and often also setting up and provisioning) against an empty cluster. For this use case it makes sense to make a data set mandatory.

Now that Rally is used for a wider range of use-cases I think it would make sense to relax these constraints and only require a corpora if there are operations that rely on one. I would recommend creating an enhancement requuest.

dliappis · May 11, 2020, 6:39am

Corpora and index definitions are not required with pipeline=benchmark-only.

For example this ad-hoc track:

{
  "version": 2,
  "description": "Example query only benchmark",
  "challenges": [
    {
      "name": "query-only",
      "default": true,
      "schedule": [
        {
          "operation": {
            "operation-type": "cluster-health",
            "request-params": {
              "wait_for_status": "green"
            }
          }
        },
        {
          "operation": {
            "name": "query-match-all",
            "operation-type": "search",
            "index": "logs",
            "body": {
              "query": {
                "match_all": {}
              }
            }
          },
          "clients": 8,
          "warmup-iterations": 1000,
          "iterations": 1000,
          "target-throughput": 100
        }
      ]
    }
  ]
}

works fine with a command like:

esrally --pipeline=benchmark-only --track-path=$PWD --target-hosts=127.0.0.1:39200 --on-error=abort

You can also see such an example in the docs here: https://esrally.readthedocs.io/en/stable/track.html#a-track-with-a-single-task

I'll create an issue to document this better in the track anatony and/or the corpora and index reference definitions.

crab86 · May 11, 2020, 8:47pm

Many thanks Dimitrios for pointing this out This is exactly what is missing in the docs - that's why i raised the question here.

dliappis · May 12, 2020, 6:36am

You are welcome. I opened https://github.com/elastic/rally/issues/990.

system · June 9, 2020, 6:36am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Benchmark for existing cluster Elasticsearch rally	4	3830	August 15, 2017
Esrally got "The benchmark ended already during warmup" when running custom track Elasticsearch rally	9	1590	July 17, 2019
[URGENT] Corpora Definiton , no base-url defined \|\| Indexing Local dump.json Elasticsearch docker , rally	7	476	March 6, 2023
Rally for aggregations on existing ES cluster Elasticsearch rally	7	1165	September 19, 2019
How design tracks based on our reality usage? Elasticsearch rally	3	304	December 18, 2023

Rally custom benchmark-only pipeline / track without corpora

Related topics