Elasticsearch indexing performance below expectations at a single installation

we've got problems with the indexing rate of elasticsearch at a customer's installation. Our system is a single node redis -> logstash (5.6.1) -> elasticsearch (5.6.1) pipeline.

we've run several different tests in our lab and in the customers deployment,

  • indexing 500MB of test data with our test pipeline took 2min54s, at the customers system 9min06s
  • another test, with a different pipeline (a sip - megaco- call with 45 messages / call) at our development server could manage 113calls/s, on our customers site only 46calls/s

to isolate the problem we wrote a test to index messages directly into elasticsearch (call with 29 messages, about 27kb json data, the results were:

development: 12667 messages / second
customer: 3427 messages / second
local macbook pro 2015: 7966 messages/ second

our customer has a mounted ssd, IOSTAT shows around 15mb/s

  Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
   vda               1.88         0.53        38.07         32       2284
   vdb             506.18        68.20     14209.00       4092     852540

   Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
   vda               2.40        71.13        29.27       4268       1756
   vdb             498.27       207.80     13844.40      12468     830664

   Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
   vda               3.50         7.00        35.07        420       2104
   vdb             575.10       396.27     17268.47      23776    1036108

   Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
   vda               3.08         3.73        46.53        224       2792
   vdb             584.55       326.00     18992.93      19560    1139576

   Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
   vda               1.73         6.27        18.47        376       1108
   vdb             592.80       584.13     19776.93      35048    1186616  

TOP shows usually a high load, but 1/3 of CPU seem idle, it doesn't keep pace with the logs sent (Redis queue grows until memory is limited)

top - 15:35:50 up 23:12,  8 users,  load average: 13.91, 14.95, 14.63
Tasks: 245 total,  1 running, 244 sleeping,  0 stopped,  0 zombie
%Cpu0  : 20.6 us,  2.8 sy, 35.7 ni, 33.9 id,  1.0 wa,  0.0 hi,  1.4 si,  4.5 st
%Cpu1  : 38.5 us,  2.1 sy, 22.3 ni, 30.0 id,  3.5 wa,  0.0 hi,  0.0 si,  3.5 st
%Cpu2  :  9.0 us,  1.7 sy, 34.1 ni, 50.3 id,  1.4 wa,  0.0 hi,  0.0 si,  3.4 st
%Cpu3  : 14.5 us,  1.4 sy, 24.8 ni, 54.8 id,  1.0 wa,  0.0 hi,  0.0 si,  3.4 st
%Cpu4  : 13.8 us,  1.4 sy, 34.3 ni, 46.7 id,  0.7 wa,  0.0 hi,  0.0 si,  3.1 st
%Cpu5  : 43.9 us,  1.4 sy, 17.2 ni, 31.9 id,  1.4 wa,  0.0 hi,  0.0 si,  4.2 st
%Cpu6  : 20.0 us,  2.1 sy, 42.8 ni, 28.8 id,  2.1 wa,  0.0 hi,  0.0 si,  4.2 st
%Cpu7  : 21.1 us,  2.8 sy, 29.8 ni, 39.1 id,  3.1 wa,  0.0 hi,  0.0 si,  4.2 st
%Cpu8  : 24.6 us,  2.5 sy, 28.4 ni, 37.2 id,  3.2 wa,  0.0 hi,  0.0 si,  4.2 st
%Cpu9  : 15.0 us,  1.4 sy, 34.1 ni, 42.9 id,  2.8 wa,  0.0 hi,  0.3 si,  3.5 st
%Cpu10 : 14.2 us,  1.4 sy, 21.5 ni, 57.8 id,  1.4 wa,  0.0 hi,  0.0 si,  3.8 st
%Cpu11 : 17.4 us,  1.7 sy, 28.1 ni, 46.2 id,  2.8 wa,  0.0 hi,  0.3 si,  3.5 st
%Cpu12 : 16.8 us,  2.1 sy, 34.6 ni, 40.6 id,  1.7 wa,  0.0 hi,  0.3 si,  3.8 st
%Cpu13 : 18.1 us,  3.1 sy, 31.7 ni, 40.4 id,  2.8 wa,  0.0 hi,  0.0 si,  3.8 st
%Cpu14 : 11.8 us,  1.7 sy, 40.3 ni, 40.6 id,  2.1 wa,  0.0 hi,  0.0 si,  3.5 st
%Cpu15 : 31.9 us,  1.8 sy, 26.3 ni, 34.7 id,  1.8 wa,  0.0 hi,  0.0 si,  3.5 st
KiB Mem : 24621160 total,  1793300 free, 15553140 used,  7274720 buff/cache
KiB Swap:  2096124 total,  1945872 free,  150252 used.  8589640 avail Mem 

Our development system:

VM with 8 CPU assigned, 12 GB RAM
Logstash 5.6.1 with 8 logstash workers,
Elasticsearch 5.6.1, -Xms6g -Xmx6g

Chassis: DELL PowerEdge R630 
CPU: Intel E5.2670v3 @ 2.30 GHz; 1 Socket / 12 Cores; HT enabled (24 vCPUs) 
RAM: 64 GB 
HDD: Seagate ST300MM0006 300GB SAS 10k 6Gb/s spinning disk 
RAID: DELL PERC H730P Mini 12Gb/s 2048MB Cache 
Volume1: RAID 10 (4 Disk 300GB SAS 10k) 
Hypervisor: VMware 6.0U1

Our customer's system:

VM with 16 CPU assigned & pinned, 32 GB RAM
Logstash 5.6.1 with 16 logstash workers,
Elasticsearch 5.6.1 -Xms14405m -Xmx14405m

Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
KVM Hypervisor
SSD Disk (Consumer)

index settings for single log entries

number_of_shards: 2,
index.refresh_interval: 5s

index settings for summary log entries (there's a separate process creating and updating summary records)

number_of_shards: 1,
index.refresh_interval: 5s

indexes are created by day and level ( debug, info, ...)

So what could be the cause for this problem? thanks a lot!

some hot threads while writing this:

Can you provide a bit extra information about how this works and what it does? How frequently are documents updated?

we had a bottleneck once, when we updated this to frequently, directly with a script in the logstash -> elasticsearch pipe ,

thanks to your support we identified it and built a queue based service, since than, this issue was solved.
in the test, that inserts bulk messages directly into elasticsearch, without the logstash pipe, the performance is also bad, so i don't think we've got another issue with updates

at a smaller installation, with 4 CPU, i notices with IOTOP, that a single elastic thread has a much higher write that at the customer installation with 16 CPU

 2211 be/4 elastics    0.00 B/s  926.31 K/s  0.00 % 12.77 % java -Xms7148m -Xmx7148m -XX:+UseConcM~elasticsearch/elasticsearch.pid --quiet
 2212 be/4 elastics    0.00 B/s  930.15 K/s  0.00 % 12.54 % java -Xms7148m -Xmx7148m -XX:+UseConcM~elasticsearch/elasticsearch.pid --quiet
 2214 be/4 elastics    0.00 B/s  941.68 K/s  0.00 % 10.41 % java -Xms7148m -Xmx7148m -XX:+UseConcM~elasticsearch/elasticsearch.pid --quiet
 2215 be/4 elastics    0.00 B/s  895.56 K/s  0.00 %  8.92 % java -Xms7148m -Xmx7148m -XX:+UseConcM~elasticsearch/elasticsearch.pid --quiet
11302 be/4 elastics    0.00 B/s   15.80 M/s  0.00 %  0.01 % java -Xms7148m -Xmx7148m -XX:+UseConcM~elasticsearch/elasticsearch.pid --quiet
 2097 be/4 elastics    0.00 B/s    0.00 B/s  0.00 %  0.00 % java -Xms7148m -Xmx7148m -XX:+UseConcM~elasticsearch/elasticsearch.pid --quiet
 2183 be/4 elastics    0.00 B/s    3.84 K/s  0.00 %  0.00 % java -Xms7148m -Xmx7148m -XX:+UseConcM~elasticsearch/elasticsearch.pid --quiet
 3852 be/4 root        0.00 B/s    3.84 K/s  0.00 %  0.00 % ilogd -d 64 -p 640

customer

 9840 be/4 elastics    0.00 B/s  144.04 K/s  0.00 %  4.35 % java -Xms14405m -Xmx14405m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupan~ault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
 9839 be/4 elastics    0.00 B/s  534.00 K/s  0.00 %  3.72 % java -Xms14405m -Xmx14405m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupan~ault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
 9826 be/4 elastics    0.00 B/s  217.82 K/s  0.00 %  3.42 % java -Xms14405m -Xmx14405m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupan~ault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
 9825 be/4 elastics    0.00 B/s  270.51 K/s  0.00 %  3.32 % java -Xms14405m -Xmx14405m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupan~ault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
 9829 be/4 elastics    0.00 B/s  129.99 K/s  0.00 %  2.95 % java -Xms14405m -Xmx14405m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupan~ault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
 9830 be/4 elastics    0.00 B/s  158.09 K/s  0.00 %  2.73 % java -Xms14405m -Xmx14405m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupan~ault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
 9834 be/4 elastics    0.00 B/s  165.12 K/s  0.00 %  2.44 % java -Xms14405m -Xmx14405m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupan~ault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
 6049 be/4 elastics    0.00 B/s 1921.71 K/s  0.00 %  2.07 % java -Xms14405m -Xmx14405m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupan~ault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
 9836 be/4 elastics    0.00 B/s  147.55 K/s  0.00 %  1.95 % java -Xms14405m -Xmx14405m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupan~ault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
 6433 be/4 elastics    0.00 B/s    2.45 M/s  0.00 %  1.93 % java -Xms14405m -Xmx14405m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupan~ault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
 9831 be/4 elastics    0.00 B/s    0.00 B/s  0.00 %  1.58 % java -Xms14405m -Xmx14405m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupan~ault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
 9837 be/4 elastics    0.00 B/s  147.55 K/s  0.00 %  1.42 % java -Xms14405m -Xmx14405m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupan~ault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
 9827 be/4 elastics    0.00 B/s  165.12 K/s  0.00 %  1.22 % java -Xms14405m -Xmx14405m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupan~ault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
 9832 be/4 elastics    0.00 B/s  140.53 K/s  0.00 %  1.18 % java -Xms14405m -Xmx14405m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupan~ault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
 9835 be/4 elastics    0.00 B/s  122.96 K/s  0.00 %  1.17 % java -Xms14405m -Xmx14405m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupan~ault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
 9828 be/4 elastics    0.00 B/s  186.20 K/s  0.00 %  1.15 % java -Xms14405m -Xmx14405m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupan~ault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
 9833 be/4 elastics    0.00 B/s    0.00 B/s  0.00 %  1.13 % java -Xms14405m -Xmx14405m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupan~ault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
 9787 be/4 elastics    0.00 B/s 1401.76 K/s  0.00 %  1.06 % java -Xms14405m -Xmx14405m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupan~ault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
 9838 be/4 elastics    0.00 B/s  158.09 K/

not sure if this information is important (could also be different data to index)

we've mounted now a SSD RAID in our development system,
instead of
RAID 10 4x HDD: Seagate ST300MM0006 300GB SAS 10k 6Gb/s spinning disk,

we're using

RAID1 2x Seagate ST200FM0073 200GB SSD

for our tests.

While speed of the direct write to elasticsearch stayed about the same, both tests of the whole data pipline had only 70% of the performance, shouldn't it be faster on SSD?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.