How to reuse already indexed data in next race?

Hi,

I would like to build up a custom track. In the end it should be like a lite version of https://github.com/elastic/rally-eventdata-track. But more simplified and with our dataset and our main kibana dashboard queries.

I need to input some historical data first. I think this will take a long time, but this inserting is not what I would like to test in main scope. So I want to index historical data only once and would like to reuse this already indexed data for the following races.

How can I do this?

I found the parameter preserve-install which keeps the installation and the data below the races directory. But How can I reuse these instances in following races?

Thanks, Andreas

Hi @asp,

admittedly, what you want to achieve is a bit tricky currently. IMHO you have two options:

  • Ingest the data and reuse the cluster that contains the data (by specifying --preserve-install=true on the command line). The cluster will be in ~/.rally/benchmarks/races/$TIMESTAMP/rally-node-0/install on each of the target nodes. Afterwards, you need to start the cluster yourself (instead of having Rally do it for you) by using --pipeline=benchmark-only (see the docs for details)
  • Use snapshots: This also requires that you do an initial ingestion with --preserve-install=true. You'd then create a snapshot with the snapshot API. You need to store that snapshot then in a place where you benchmark infrastructure can access it (e.g. an internal Webserver or a private S3 bucket). For your track, you'd then need a custom runner that can restore that snapshot in the first step.

IMHO the second option is the cleaner approach because it allows you restore the data without the need to keep the cluster. We plan to add support to restore from a snapshot out of the box so this will be simpler in the future.

Daniel

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.