IO / Disc intensive load testing

Hello,
My task is to make an elastic search disk/io "hard life" :-). To not reinventing the wheel - is there a tool for simulation high disk load on elastic?
what type of query is the most disk/io intensive? I saw many articles and the best practices on how to tune performance but my task is really to kill disk/io for elastic and I didn't find any useful open information on the web.

Thank you in advance.

Welcome!

You could use Rally for this. It's a purpose-built benchmarking tool for Elasticsearch.

There are several specimen datasets available and you can configure the load to put on the cluster.

Thank you for the answer @whatgeorgemade. I saw Rally. I probably wrong but what I understood Rally is mostly used or benchmark of different elastic search versions aka comparing the performance of version A against version B.

In my case, I am trying to cause disk high load testing. For example - a very complicated query requires a full index scan and running or heavy write insertions. I don't know what kind of operation/queries this could be. For example, relation databases "hates" nested joins. I can hang DB by this kind of query.
In the case of elastic what kind of query/insertions can cause disk saturation.

Does Rally have such capabilities?

Heavy indexing is probably the easiest way to saturate disk I/O assuming you have sufficient networking throughput and CPU for that to not be a bottleneck.

1 Like

You can use Rally to benchmark your own cluster. If you have data already, you can even benchmark writing those documents, and running your own queries.

If you do have your own data, the best way to start is by creating a new track based on one of your existing indices. Have a look here to see how that's done.

If you don't have your own data, you can look through the sample datasets and use one that is close to the type of document you're going to be indexing.

1 Like

Ok, what kind of documents it should be. Is it a massive index of simple documents or I should be kind of Wikipedia. By default elastic is indexing by every JSON field , right? if I will simulate load with 100 - 1000 field each document, does the number of fields effect on indexing effort and as a result performance?

Thanks in advance.

I am not sure what the relationship between document size and disk I/O is so suspect you will need to test. I do not think you need very large documents to saturate I/O so you can probably use one of the existing Rally datasets.