I would like to build up a custom track. In the end it should be like a lite version of https://github.com/elastic/rally-eventdata-track. But more simplified and with our dataset and our main kibana dashboard queries.
I need to input some historical data first. I think this will take a long time, but this inserting is not what I would like to test in main scope. So I want to index historical data only once and would like to reuse this already indexed data for the following races.
How can I do this?
I found the parameter preserve-install which keeps the installation and the data below the races directory. But How can I reuse these instances in following races?
admittedly, what you want to achieve is a bit tricky currently. IMHO you have two options:
Ingest the data and reuse the cluster that contains the data (by specifying --preserve-install=true on the command line). The cluster will be in ~/.rally/benchmarks/races/$TIMESTAMP/rally-node-0/install on each of the target nodes. Afterwards, you need to start the cluster yourself (instead of having Rally do it for you) by using --pipeline=benchmark-only (see the docs for details)
Use snapshots: This also requires that you do an initial ingestion with --preserve-install=true. You'd then create a snapshot with the snapshot API. You need to store that snapshot then in a place where you benchmark infrastructure can access it (e.g. an internal Webserver or a private S3 bucket). For your track, you'd then need a custom runner that can restore that snapshot in the first step.
IMHO the second option is the cleaner approach because it allows you restore the data without the need to keep the cluster. We plan to add support to restore from a snapshot out of the box so this will be simpler in the future.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.