Best practices for testing against clusters on ECE

Hello,

We intend to test the ingest and query performance of clusters hosted on ECE by using Rally. So far, the elastic/logs track looks interesting. I studied this video and pdf on Rally testing pitfalls by the Rally author Daniel Mitterdorfer. The last topic in there stresses the importance of repeating your test many (>30) times, in order to be able to base conclusions on the results.

That triggers these questions:

  1. How to get your to-be-tested cluster on ECE in the same initial state before each test run, so to be able to compare apples-to-apples? I guess you will need to delete and re-create your cluster each time and then optionally load a snapshot with the initial test data. Delete+create cluster and load snapshot could be done via the respective API, allowing automated repeated testing.

  2. How does ECE itself behave: will it have the same performance after deleting and creating a cluster say 100 times? Or will there be a build-up of discarded data? We use ECE version 3.3.0 (I realize this post covers both Rally and ECE functionality).

Best practices in general on using Rally against ECE-hosted clusters are very welcome too.

Thanks!
Jan Stap

Hi Jan,

Thank you for your post.

  1. How to get your to-be-tested cluster on ECE in the same initial state before each test run, so to be able to compare apples-to-apples? I guess you will need to delete and re-create your cluster each time and then optionally load a snapshot with the initial test data.

We recommend benchmarking to a new cluster with each benchmark run. It is how the ES nightly benchmarks are done.

  1. How does ECE itself behave: will it have the same performance after deleting and creating a cluster say 100 times? Or will there be a build-up of discarded data? We use ECE version 3.3.0 (I realize this post covers both Rally and ECE functionality).

I would not expect ECE to behave differently after 5, 10, or 1000 clusters. When a cluster deployment is deleted, so is its data. When doing this type of benchmarking, we tend to use Elasticsearch node sizes large enough to consume an entire allocator, but it is not required.

The elastic/logs track is a great track for benchmarking. For indexing, you will want to mind bulk_size and bulk_indexing_clients to not overwhelm your deployments. And if you are satisfied with your indexing benchmarks, you can certainly create a new deployment, restore the snapshot, then run loggging-querying without the bulk-index and compression-stats tasks to run just the query workflows.

In ECE, container cgroup CPU time is scheduled using the Completely Fair Scheduler (CFS). In case you haven't seen it, take a look at Manage your installation capacity | Elastic Cloud Enterprise Reference [3.6] | Elastic to see how the CPU quota is calculated.

Thank you,
Jason

Hi Jason,

Thanks for your detailed answer! And thanks for pointing out on the ECE CPU scheduler; I have seen it before, but I'm now better aware of it.

Best regards,
Jan

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.