Elasticsearch Benchmarking docs/s

What kind of optimizations is Elastic using on their benchmarking hosts? I have a bare-metals server, with 20 cpu cores, 64 gb of ram, and a mounted nvme drive, but when I use rally locally with
esrally --distribution-version=6.4.0 --track=nyc_taxis --car="4gheap"

I am getting at most about 58k docs/s, while elastic is reporting around 80k docs/s for their add-4g test (single node).

I have gone through the important system configurations. But there is still about a 20k docs/s difference. Based on what is said in the benchmarking methodology and environment, I am doubtful that there is a hardware difference that is making this significant of a difference.

Is there something that I am missing?

Thanks.

What type of disk do you have?

NVMe SSD

I believe we are running Rally on a separate host and are using 10G networking, so if you are running Rally on the Elasticsearch host that may perhaps explain the difference. What does CPU usage and disk I/O look like during indexing?

Hi,

what Christian is saying is correct. We do have the load test driver on a dedicated machine. Please check https://elasticsearch-benchmarks.elastic.co/ for the detailed hardware and software configuration. We intentionally run with stock configuration as much as possible so we also don't do any kernel tuning for example (apart from the changes that are required to run Elasticsearch and that you've mentioned as well in your original post). There is only one exception: We turn on transparent huge pages and the reason is that in earlier kernel versions (IIRC before 4.12.2) this was set to always and changed to madvise and we have only "tuned" this so the historic results are better comparable.

Before every benchmark we run a setup routine for better reproducible results. We always setup a fresh file system on the disk, issue a TRIM and drop the page cache. See Is your Elasticsearch TRIMmed for more background info.

I'd start by putting the load test driver on a dedicated machine. As a next step I'd look for bottlenecks (Seven tips for better Elasticsearch benchmarks has some pointers).

Daniel

@Christian_Dahlqvist that could be it. I noticed that during index, some python3 processes would sometimes have spikes in io when it is reading the dataset, plus there were a few esrally processes that were taking up a small, but not negligible amount of cpu during the index.

I'll try to get a 10G connection between two servers so I can test that idea. When I tried a remote session I did not get much different results, but the load test driver was on a 1G connection.

Thanks.

I used iotop, and this is a snippet of what it showed during index

Total DISK READ :      17.91 M/s | Total DISK WRITE :     139.19 M/s
Actual DISK READ:      17.91 M/s | Actual DISK WRITE:      40.24 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
16919 be/4 elksvr      0.00 B/s    2.24 M/s  0.00 %  1.67 % java -Xms4g -Xmx4g -XX:+Use~arch.bootstrap.Elasticsearch
16918 be/4 elksvr      0.00 B/s    3.45 M/s  0.00 %  0.20 % java -Xms4g -Xmx4g -XX:+Use~arch.bootstrap.Elasticsearch
16904 be/4 elksvr      0.00 B/s    5.27 M/s  0.00 %  0.17 % java -Xms4g -Xmx4g -XX:+Use~arch.bootstrap.Elasticsearch
16907 be/4 elksvr      0.00 B/s    3.55 M/s  0.00 %  0.14 % java -Xms4g -Xmx4g -XX:+Use~arch.bootstrap.Elasticsearch
16909 be/4 elksvr      0.00 B/s    3.42 M/s  0.00 %  0.10 % java -Xms4g -Xmx4g -XX:+Use~arch.bootstrap.Elasticsearch
16923 be/4 elksvr      0.00 B/s    3.86 M/s  0.00 %  0.10 % java -Xms4g -Xmx4g -XX:+Use~arch.bootstrap.Elasticsearch
16911 be/4 elksvr      0.00 B/s    4.48 M/s  0.00 %  0.06 % java -Xms4g -Xmx4g -XX:+Use~arch.bootstrap.Elasticsearch
  894 be/3 root        0.00 B/s 1037.65 K/s  0.00 %  0.04 % [jbd2/nvme0n1p1-]
16912 be/4 elksvr      0.00 B/s    2.93 M/s  0.00 %  0.03 % java -Xms4g -Xmx4g -XX:+Use~arch.bootstrap.Elasticsearch
16924 be/4 elksvr      0.00 B/s    2.94 M/s  0.00 %  0.02 % java -Xms4g -Xmx4g -XX:+Use~arch.bootstrap.Elasticsearch
16916 be/4 elksvr      0.00 B/s 2028.84 K/s  0.00 %  0.01 % java -Xms4g -Xmx4g -XX:+Use~arch.bootstrap.Elasticsearch
16946 be/4 elksvr      0.00 B/s   75.21 M/s  0.00 %  0.01 % java -Xms4g -Xmx4g -XX:+Use~arch.bootstrap.Elasticsearch
16957 be/4 elksvr      0.00 B/s   24.09 M/s  0.00 %  0.01 % java -Xms4g -Xmx4g -XX:+Use~arch.bootstrap.Elasticsearch
16888 be/4 elksvr      4.60 M/s    0.00 B/s  0.00 %  0.00 % python3 /usr/local/bin/esra~track=nyc_taxis --car=4gheap
16890 be/4 elksvr      4.36 M/s    0.00 B/s  0.00 %  0.00 % python3 /usr/local/bin/esra~track=nyc_taxis --car=4gheap
16892 be/4 elksvr      4.48 M/s    0.00 B/s  0.00 %  0.00 % python3 /usr/local/bin/esra~track=nyc_taxis --car=4gheap
16893 be/4 elksvr      4.48 M/s    0.00 B/s  0.00 %  0.00 % python3 /usr/local/bin/esra~track=nyc_taxis --car=4gheap
16906 be/4 elksvr      0.00 B/s  367.82 K/s  0.00 %  0.00 % java -Xms4g -Xmx4g -XX:+Use~arch.bootstrap.Elasticsearch
16908 be/4 elksvr      0.00 B/s  379.44 K/s  0.00 %  0.00 % java -Xms4g -Xmx4g -XX:+Use~arch.bootstrap.Elasticsearch
16913 be/4 elksvr      0.00 B/s 1653.27 K/s  0.00 %  0.00 % java -Xms4g -Xmx4g -XX:+Use~arch.bootstrap.Elasticsearch
16917 be/4 elksvr      0.00 B/s  491.72 K/s  0.00 %  0.00 % java -Xms4g -Xmx4g -XX:+Use~arch.bootstrap.Elasticsearch
16922 be/4 elksvr      0.00 B/s 1970.76 K/s  0.00 %  0.00 % java -Xms4g -Xmx4g -XX:+Use~arch.bootstrap.Elasticsearch

and for cpu usage: top

top - 10:44:28 up 5 days,  2:32,  4 users,  load average: 10.07, 8.65, 4.83
Tasks: 271 total,   1 running, 269 sleeping,   0 stopped,   1 zombie
%Cpu(s): 49.0 us,  1.5 sy,  0.0 ni, 49.4 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 65940932 total,  7513392 free,  5570352 used, 52857188 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 59753956 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
16666 elksvr    20   0 16.953g 5.491g 967124 S 925.9  8.7  91:51.84 java
16887 elksvr    20   0  185568  62184   5512 S  12.0  0.1   1:08.50 esrally
16882 elksvr    20   0  185576  62016   5512 S  11.6  0.1   1:10.67 esrally
16881 elksvr    20   0  185824  71728   5496 S  10.6  0.1   1:08.63 esrally
16883 elksvr    20   0  185836  62372   5512 S  10.6  0.1   1:10.57 esrally
16880 elksvr    20   0  185308  72400   5508 S  10.3  0.1   1:06.01 esrally
16885 elksvr    20   0  185816  71788   5512 S   9.6  0.1   1:08.91 esrally
16884 elksvr    20   0  185304  61740   5512 S   9.0  0.1   1:10.60 esrally
16886 elksvr    20   0  185564  62276   5512 S   7.6  0.1   1:08.21 esrally
16184 root      20   0   57120  16044   7516 S   5.0  0.0   0:31.51 iotop
16291 elksvr    20   0   40632   3888   3164 R   0.7  0.0   0:01.76 top

Hi,

For single node benchmarks a 1GB connection is usually fine. You should check though whether the network is saturated. It's just important that you avoid resource contention between Elasticsearch and Rally. Also, loopback behaves a bit differently than Ethernet (e.g. different MTU, different code paths in the kernel).

Daniel

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.