Recently we have started to benchmark our Elasticsearch cluster using Rally. We have designed a custom workload to be able to find the desired shard size for us.
We are really interested in generating a chart like the one shown in the Elasticon Quantitative Cluster Sizing (24:30 min)
For those charts I used a different benchmarking tool as Rally at that point was not able to handle this, and the dashboards were created in Kibana 4, which was relatively new back then .
Rally has changed and evolved a lot since then but does not provide anything like this out of the box, so you will need to create them yourself. Fortunately Kibana has gotten a lot more advanced and flexible over the years so I am sure you will be able to create something a lot nicer.
As long as you have an Elasticsearch instance set up as a metrics store for Rally, it should be possible to generate these type of dashboards. In the metrics store you have results for every individual request, so as long as you have applied tags and metadata that allow you to identify which iteration results belong to it should be relatively easy to create a saved search with a filter and create these kind of visualizations.
I saw your talk, loved it, and I am running some tests on our own data as you guys suggested. I used an All-in-One instance and indexed logging data with one shard and no replicas and started changing some parameters and running some tests based on whether we enabled best compression or not. I now have over 70M documents and I wish i can generate visualizations from my indexes to see when do I hit a limit for my shard sizing.
Can you please provide an example on what fields you based yours on or how to setup this kind of visuals ?
When I created the rally-eventdata-track I included a challange showing an example of how to run shard sizing. It seems this was removed during a cleanup of the track recently. In this challenge I simply ran X indexing iterations of a fixed number of documents and the followed on by running a set of queries. For each operation I set the iteration number as a metadata field. This allowed me to create a saved query showing service-time for the queries. I then created a simple histogram based on the iteration number showing the average latency per type of query.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.