Here is how I do benchmarking with my existing ES cluster:
- I have an existing ES cluster built on OpenShift environment
- The cluster has 5 nodes and each is having exactly same resources
- The ES cluster is being exposed with single endpoint, so when visiting from outside the cluster works like a single instance because OpenShift will handle the load balancing.
- I created my custom track with my own index data
- I ran the race with docker command, like:
docker run --rm -v ${pwd}/esrally/.rally:/rally/.rally -v ${pwd}/esrally/reports:/rally/reports elastic/rally:2.10.0 race --pipeline=benchmark-only --target-host=remote.host.com:80 --track=my-custom-track --challenge=default --report-file=/rally/reports/my-custom-track-rally-report.md --report-format=markdown --on-error=abort --offline
But When I tried to run a few rounds of races (with the same version of ES) and compared the result, I found the differences can be very big. For example:
| Metric | Task | Baseline | Contender | Diff | Unit | Diff % |
|--------------------------------------------------------------:|---------------------:|-----------------:|--------------:|--------------:|-------:|---------:|
| Cumulative indexing time of primary shards | | 63.0863 | 28.0918 | -34.9945 | min | -55.47% |
| Min cumulative indexing time across primary shard | | 0 | 0 | 0 | min | 0.00% |
| Median cumulative indexing time across primary shard | | 0.000741667 | 0 | -0.00074 | min | -100.00% |
| Max cumulative indexing time across primary shard | | 3.20992 | 3.29123 | 0.08132 | min | +2.53% |
| Cumulative indexing throttle time of primary shards | | 0 | 0 | 0 | min | 0.00% |
| Min cumulative indexing throttle time across primary shard | | 0 | 0 | 0 | min | 0.00% |
| Median cumulative indexing throttle time across primary shard | | 0 | 0 | 0 | min | 0.00% |
| Max cumulative indexing throttle time across primary shard | | 0 | 0 | 0 | min | 0.00% |
| Cumulative merge time of primary shards | | 123.878 | 1.08262 | -122.796 | min | -99.13% |
| Cumulative merge count of primary shards | | 24024 | 22 | -24002 | | -99.91% |
| Min cumulative merge time across primary shard | | 0 | 0 | 0 | min | 0.00% |
| Median cumulative merge time across primary shard | | 0.00045 | 0 | -0.00045 | min | -100.00% |
| Max cumulative merge time across primary shard | | 17.5161 | 0.566267 | -16.9498 | min | -96.77% |
| Cumulative merge throttle time of primary shards | | 0.259333 | 0.1108 | -0.14853 | min | -57.28% |
| Min cumulative merge throttle time across primary shard | | 0 | 0 | 0 | min | 0.00% |
| Median cumulative merge throttle time across primary shard | | 0 | 0 | 0 | min | 0.00% |
| Max cumulative merge throttle time across primary shard | | 0.134967 | 0.110767 | -0.0242 | min | -17.93% |
| Cumulative refresh time of primary shards | | 48.9657 | 3.7339 | -45.2318 | min | -92.37% |
| Cumulative refresh count of primary shards | | 224377 | 984 | -223393 | | -99.56% |
| Min cumulative refresh time across primary shard | | 0 | 0 | 0 | min | 0.00% |
| Median cumulative refresh time across primary shard | | 0.003475 | 0 | -0.00347 | min | -100.00% |
| Max cumulative refresh time across primary shard | | 5.46093 | 0.523717 | -4.93722 | min | -90.41% |
| Cumulative flush time of primary shards | | 6.23043 | 2.11223 | -4.1182 | min | -66.10% |
| Cumulative flush count of primary shards | | 14090 | 132 | -13958 | | -99.06% |
| Min cumulative flush time across primary shard | | 0 | 0 | 0 | min | 0.00% |
| Median cumulative flush time across primary shard | | 0.00205 | 0 | -0.00205 | min | -100.00% |
| Max cumulative flush time across primary shard | | 0.50635 | 0.337367 | -0.16898 | min | -33.37% |
| Total Young Gen GC time | | 69.005 | 71.058 | 2.053 | s | +2.98% |
| Total Young Gen GC count | | 5992 | 5938 | -54 | | -0.90% |
| Total Old Gen GC time | | 1.064 | 0.816 | -0.248 | s | -23.31% |
| Total Old Gen GC count | | 15 | 10 | -5 | | -33.33% |
| Store size | | 11.527 | 11.6336 | 0.10655 | GB | +0.92% |
| Translog size | | 0.6923 | 0.11595 | -0.57635 | GB | -83.25% |
| Heap used for segments | | 3.9797 | 3.93391 | -0.04579 | MB | -1.15% |
| Heap used for doc values | | 0.778618 | 0.690968 | -0.08765 | MB | -11.26% |
| Heap used for terms | | 2.65974 | 2.70258 | 0.04285 | MB | +1.61% |
| Heap used for norms | | 0.338989 | 0.346619 | 0.00763 | MB | +2.25% |
| Heap used for points | | 0 | 0 | 0 | MB | 0.00% |
| Heap used for stored fields | | 0.202354 | 0.193741 | -0.00861 | MB | -4.26% |
| Segment count | | 421 | 402 | -19 | | -4.51% |
| Total Ingest Pipeline count | | 0 | 0 | 0 | | 0.00% |
| Total Ingest Pipeline time | | 0 | 0 | 0 | ms | 0.00% |
| Total Ingest Pipeline failed | | 0 | 0 | 0 | | 0.00% |
Questions:
- Why some of the result data are so different between each race? Even each race are ran on exactly same version of ES, same data set, same track.
- How to make sure the report is reflecting the real performance of the ES cluster?
- What are the recommended steps to benchmarking a remote ES cluster built on OpenShift?
Thanks.