What kind of optimizations is Elastic using on their benchmarking hosts? I have a bare-metals server, with 20 cpu cores, 64 gb of ram, and a mounted nvme drive, but when I use rally locally with esrally --distribution-version=6.4.0 --track=nyc_taxis --car="4gheap"
I am getting at most about 58k docs/s, while elastic is reporting around 80k docs/s for their add-4g test (single node).
I have gone through the important system configurations. But there is still about a 20k docs/s difference. Based on what is said in the benchmarking methodology and environment, I am doubtful that there is a hardware difference that is making this significant of a difference.
I believe we are running Rally on a separate host and are using 10G networking, so if you are running Rally on the Elasticsearch host that may perhaps explain the difference. What does CPU usage and disk I/O look like during indexing?
what Christian is saying is correct. We do have the load test driver on a dedicated machine. Please check https://elasticsearch-benchmarks.elastic.co/ for the detailed hardware and software configuration. We intentionally run with stock configuration as much as possible so we also don't do any kernel tuning for example (apart from the changes that are required to run Elasticsearch and that you've mentioned as well in your original post). There is only one exception: We turn on transparent huge pages and the reason is that in earlier kernel versions (IIRC before 4.12.2) this was set to always and changed to madvise and we have only "tuned" this so the historic results are better comparable.
Before every benchmark we run a setup routine for better reproducible results. We always setup a fresh file system on the disk, issue a TRIM and drop the page cache. See Is your Elasticsearch TRIMmed for more background info.
@Christian_Dahlqvist that could be it. I noticed that during index, some python3 processes would sometimes have spikes in io when it is reading the dataset, plus there were a few esrally processes that were taking up a small, but not negligible amount of cpu during the index.
I'll try to get a 10G connection between two servers so I can test that idea. When I tried a remote session I did not get much different results, but the load test driver was on a 1G connection.
Thanks.
I used iotop, and this is a snippet of what it showed during index
For single node benchmarks a 1GB connection is usually fine. You should check though whether the network is saturated. It's just important that you avoid resource contention between Elasticsearch and Rally. Also, loopback behaves a bit differently than Ethernet (e.g. different MTU, different code paths in the kernel).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.