Hello,
My task is to make an elastic search disk/io "hard life" :-). To not reinventing the wheel - is there a tool for simulation high disk load on elastic?
what type of query is the most disk/io intensive? I saw many articles and the best practices on how to tune performance but my task is really to kill disk/io for elastic and I didn't find any useful open information on the web.
Thank you for the answer @whatgeorgemade. I saw Rally. I probably wrong but what I understood Rally is mostly used or benchmark of different elastic search versions aka comparing the performance of version A against version B.
In my case, I am trying to cause disk high load testing. For example - a very complicated query requires a full index scan and running or heavy write insertions. I don't know what kind of operation/queries this could be. For example, relation databases "hates" nested joins. I can hang DB by this kind of query.
In the case of elastic what kind of query/insertions can cause disk saturation.
Heavy indexing is probably the easiest way to saturate disk I/O assuming you have sufficient networking throughput and CPU for that to not be a bottleneck.
You can use Rally to benchmark your own cluster. If you have data already, you can even benchmark writing those documents, and running your own queries.
If you do have your own data, the best way to start is by creating a new track based on one of your existing indices. Have a look here to see how that's done.
If you don't have your own data, you can look through the sample datasets and use one that is close to the type of document you're going to be indexing.
Ok, what kind of documents it should be. Is it a massive index of simple documents or I should be kind of Wikipedia. By default elastic is indexing by every JSON field , right? if I will simulate load with 100 - 1000 field each document, does the number of fields effect on indexing effort and as a result performance?
I am not sure what the relationship between document size and disk I/O is so suspect you will need to test. I do not think you need very large documents to saturate I/O so you can probably use one of the existing Rally datasets.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.