We intend to test the ingest and query performance of clusters hosted on ECE by using Rally. So far, the elastic/logs track looks interesting. I studied this video and pdf on Rally testing pitfalls by the Rally author Daniel Mitterdorfer. The last topic in there stresses the importance of repeating your test many (>30) times, in order to be able to base conclusions on the results.
That triggers these questions:
How to get your to-be-tested cluster on ECE in the same initial state before each test run, so to be able to compare apples-to-apples? I guess you will need to delete and re-create your cluster each time and then optionally load a snapshot with the initial test data. Delete+create cluster and load snapshot could be done via the respective API, allowing automated repeated testing.
How does ECE itself behave: will it have the same performance after deleting and creating a cluster say 100 times? Or will there be a build-up of discarded data? We use ECE version 3.3.0 (I realize this post covers both Rally and ECE functionality).
Best practices in general on using Rally against ECE-hosted clusters are very welcome too.
How to get your to-be-tested cluster on ECE in the same initial state before each test run, so to be able to compare apples-to-apples? I guess you will need to delete and re-create your cluster each time and then optionally load a snapshot with the initial test data.
We recommend benchmarking to a new cluster with each benchmark run. It is how the ES nightly benchmarks are done.
How does ECE itself behave: will it have the same performance after deleting and creating a cluster say 100 times? Or will there be a build-up of discarded data? We use ECE version 3.3.0 (I realize this post covers both Rally and ECE functionality).
I would not expect ECE to behave differently after 5, 10, or 1000 clusters. When a cluster deployment is deleted, so is its data. When doing this type of benchmarking, we tend to use Elasticsearch node sizes large enough to consume an entire allocator, but it is not required.
The elastic/logs track is a great track for benchmarking. For indexing, you will want to mind bulk_size and bulk_indexing_clients to not overwhelm your deployments. And if you are satisfied with your indexing benchmarks, you can certainly create a new deployment, restore the snapshot, then run loggging-querying without the bulk-index and compression-stats tasks to run just the query workflows.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.