IO / Disc intensive load testing

Oleg_Ruchovets · September 7, 2020, 7:43am

Hello,
My task is to make an elastic search disk/io "hard life" :-). To not reinventing the wheel - is there a tool for simulation high disk load on elastic?
what type of query is the most disk/io intensive? I saw many articles and the best practices on how to tune performance but my task is really to kill disk/io for elastic and I didn't find any useful open information on the web.

Thank you in advance.

whatgeorgemade · September 7, 2020, 8:14am

Welcome!

You could use Rally for this. It's a purpose-built benchmarking tool for Elasticsearch.

There are several specimen datasets available and you can configure the load to put on the cluster.

Oleg_Ruchovets · September 7, 2020, 8:38am

Thank you for the answer @whatgeorgemade. I saw Rally. I probably wrong but what I understood Rally is mostly used or benchmark of different elastic search versions aka comparing the performance of version A against version B.

In my case, I am trying to cause disk high load testing. For example - a very complicated query requires a full index scan and running or heavy write insertions. I don't know what kind of operation/queries this could be. For example, relation databases "hates" nested joins. I can hang DB by this kind of query.
In the case of elastic what kind of query/insertions can cause disk saturation.

Does Rally have such capabilities?

Christian_Dahlqvist · September 7, 2020, 8:53am

Heavy indexing is probably the easiest way to saturate disk I/O assuming you have sufficient networking throughput and CPU for that to not be a bottleneck.

whatgeorgemade · September 7, 2020, 8:54am

You can use Rally to benchmark your own cluster. If you have data already, you can even benchmark writing those documents, and running your own queries.

If you do have your own data, the best way to start is by creating a new track based on one of your existing indices. Have a look here to see how that's done.

If you don't have your own data, you can look through the sample datasets and use one that is close to the type of document you're going to be indexing.

Oleg_Ruchovets · September 7, 2020, 9:09am

Ok, what kind of documents it should be. Is it a massive index of simple documents or I should be kind of Wikipedia. By default elastic is indexing by every JSON field , right? if I will simulate load with 100 - 1000 field each document, does the number of fields effect on indexing effort and as a result performance?

Thanks in advance.

Christian_Dahlqvist · September 7, 2020, 9:17am

I am not sure what the relationship between document size and disk I/O is so suspect you will need to test. I do not think you need very large documents to saturate I/O so you can probably use one of the existing Rally datasets.

system · October 5, 2020, 9:17am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
IO / Disc "tear down" for elastic search Elasticsearch rally	42	2582	November 10, 2020
Testing Load capabilities of Elastic search Elasticsearch	6	748	September 15, 2017
Stress Test / Performance Test on Elasticsearch Elasticsearch	2	4677	June 3, 2017
Quantifying Elasticsearch Storage Performance Elasticsearch	2	1792	April 3, 2019
High disk read resulting in io wait on new Cluster 7.9 Elasticsearch	11	1992	April 20, 2022

IO / Disc intensive load testing

Related topics