How to reuse already indexed data in next race?

asp · March 15, 2018, 8:33am

Hi,

I would like to build up a custom track. In the end it should be like a lite version of https://github.com/elastic/rally-eventdata-track. But more simplified and with our dataset and our main kibana dashboard queries.

I need to input some historical data first. I think this will take a long time, but this inserting is not what I would like to test in main scope. So I want to index historical data only once and would like to reuse this already indexed data for the following races.

How can I do this?

I found the parameter preserve-install which keeps the installation and the data below the races directory. But How can I reuse these instances in following races?

Thanks, Andreas

danielmitterdorfer · March 15, 2018, 8:57am

Hi @asp,

admittedly, what you want to achieve is a bit tricky currently. IMHO you have two options:

Ingest the data and reuse the cluster that contains the data (by specifying --preserve-install=true on the command line). The cluster will be in ~/.rally/benchmarks/races/$TIMESTAMP/rally-node-0/install on each of the target nodes. Afterwards, you need to start the cluster yourself (instead of having Rally do it for you) by using --pipeline=benchmark-only (see the docs for details)
Use snapshots: This also requires that you do an initial ingestion with --preserve-install=true. You'd then create a snapshot with the snapshot API. You need to store that snapshot then in a place where you benchmark infrastructure can access it (e.g. an internal Webserver or a private S3 bucket). For your track, you'd then need a custom runner that can restore that snapshot in the first step.

IMHO the second option is the cleaner approach because it allows you restore the data without the need to keep the cluster. We plan to add support to restore from a snapshot out of the box so this will be simpler in the future.

Daniel

system · April 12, 2018, 8:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can Rally run same test more than once, and can it save generated data for use in another run Elasticsearch rally	2	442	April 30, 2020
Using already indexed data for benchmarking Elasticsearch rally	2	689	March 19, 2018
How to create and use data for indexing using Rally Elasticsearch rally	5	876	April 8, 2020
Storing Metrics in new ES index Elasticsearch rally	2	467	May 19, 2019
How do I run the eventdata-track? Elasticsearch rally	2	837	June 6, 2017

How to reuse already indexed data in next race?

Related topics