Hey,
Thanks for your interest.
To create high disk load with Rally, obviously you could try expensive queries (plus parallel indexing) but another angle is to rely on ingest with with large % of document updates.
You could take a look at a blog article I co-authored where we used Rally to simulate high IO (to assist with a kernel bug investigation): https://www.elastic.co/blog/canonical-elastic-and-google-team-up-to-prevent-data-corruption-in-linux
In the reproduction scripts linked there you'll see we are relying on a bulk-update challenge; if you additionally set id_seq_low_id_bias
to true
and tweak the probability too you are going to end up with a very disk io heavy workload.
You can also take a look at the daily-log-volume-index-and-query eventdata challenge (see the docs/parameters here: https://github.com/elastic/rally-eventdata-track#index-and-query-logs-fixed-daily-volume) and bump the # of search clients and other parameters.
Dimitris