I've read Are there docs to help interpret the results? and ESRally Metrics Definition but still have some questions on interpreting results, specifically those from a tournament. In the results below, the index-append throughputs are ~10% better for the contender, but indexing/merge/refresh/flush/throttle are waaaaay better.
The baseline is run on 2 r4.xlarges with provisioned IOPS (io1) EBS, while the contender is 2 i3.xlarges using the local NVMe SSD's. Rally is invoked on a r4.xlarge.
I'd expect the index-append throughput rates to be much more favorable on the SSDs as all the other metrics are. What am I missing?
| Metric | Task | Baseline | Contender | Diff | Unit |
|-------------------------------:|------------------------------:|------------:|------------:|---------:|-------:|
| Indexing time | | 160.712 | 69.6534 | -91.0585 | min |
| Merge time | | 1973.32 | 82.5694 | -1890.75 | min |
| Refresh time | | 211.762 | 12.6196 | -199.143 | min |
| Flush time | | 2.50997 | 0.824467 | -1.6855 | min |
| Merge throttle time | | 214.459 | 55.5885 | -158.871 | min |
| Min Throughput | index-append | 531.094 | 582.043 | 50.9495 | docs/s |
| Median Throughput | index-append | 585.235 | 640.781 | 55.5465 | docs/s |
| Max Throughput | index-append | 616.81 | 685.16 | 68.3507 | docs/s |
| 50th percentile latency | index-append | 5319.24 | 4880.9 | -438.338 | ms |
| 90th percentile latency | index-append | 7861.49 | 6394.39 | -1467.1 | ms |
| 99th percentile latency | index-append | 10389.6 | 8166.32 | -2223.31 | ms |
| 100th percentile latency | index-append | 11361.2 | 9180.74 | -2180.46 | ms |
| 50th percentile service time | index-append | 5319.24 | 4880.9 | -438.338 | ms |
| 90th percentile service time | index-append | 7861.49 | 6394.39 | -1467.1 | ms |
| 99th percentile service time | index-append | 10389.6 | 8166.32 | -2223.31 | ms |
| 100th percentile service time | index-append | 11361.2 | 9180.74 | -2180.46 | ms |
| error rate | index-append | 0 | 0 | 0 | % |