Elasticsearch performance issue, not able to index 1000 documents per second


#1

Hello,

Im kinda new to the elastic suite, I’ve been doing some research on my own trying to solve the issue but didn’t have any luck, so im kindly asking anyone here to help me out in this matter.

Scenario:

Simple: Two servers connected back to back:

ServerA -10g- ServerB

ServerA running Elasticsearch

[root@elasticsearch elasticsearch]# /usr/share/elasticsearch/bin/elasticsearch --version
Version: 2.2.2, Build: fcc01dd/2016-03-29T08:49:35Z, JVM: 1.8.0_65

ServerB running Logstash

[root@logstash conf.d]# /opt/logstash/bin/logstash --version
logstash 2.2.4

Purpose
Elasticsearch search should be able to index around a minimum of 1500 documents per second.

Issue

On ServerB I have Logstash and in ServerA I have Elasticsearch.

After executing a Perl script that generates around 4.5 million logs (it sends 1000 logs every second) in around 78 minutes I noticed that after the script is finished, less than 2.8m documents are indexed, meaning that a little les than 2M documents get lost. There is a bottleneck somewhere.

I discarded logstash as the bottlenext, the reason is because I was able to redirect Logstash output to a document and was able to successfully write 1M, 2M, 8M lines into the document without a single loss.

Logstash configuration

input {
  udp {
    port => 5002
  }
}

filter {
    grok {
      match => { "message" => "%{NUMBER:Log} %{WORD:MAC} %{NUMBER:Var0} %{NUMBER:Var1} %{NUMBER:Var2} %{NUMBER:Var3} %{NUMBER:Var4} %{NUMBER:Var5} %{NUMBER:Var6} %{NUMBER:Var7} %{NUMBER:Var8} %    {NUMBER:Var9}" }
    }
}

output {
  elasticsearch { hosts => ["10.10.40.225:9200"] }
}

Elasticsearch configuration

Elasticsearch configuration is pretty much default except for the following tweaks I’ve done based on some reading I’ve been doing about Elasticsearch performance:

-File Descriptors, increased to 64k

[root@elasticsearch ~]# cat /proc/18045/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             30504                30504                processes
Max open files            65535                65535                files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       30504                30504                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us
[root@elasticsearch ~]#

-HEAP, increased to 4g (half of current RAM on the server)

[root@elasticsearch ~]# !ps
ps -ef | grep -i elasticse
avahi      799     1  0  2016 ?        00:00:01 avahi-daemon: running [elasticsearch.local]
elastic+ 18045     1 99 19:25 ?        00:00:04 /bin/java -Xms4g -Xmx4g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Djna.nosys=true -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/elasticsearch-2.2.2.jar:/usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch start -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.conf=/etc/elasticsearch

(Christian Dahlqvist) #2

The first thing I would recommend you to do is tune your Logstash config. Write a good amount of data to a file and use this as input as this performs well and will not drop data. Then tune the number of output and filter workers and see how this affects the indexing rate in Elasticsearch. This should give you a good idea of what the Elasticsearch node can handle.

Once this indexing rate is higher than the rate you need to support and will be generating, have a look at how your test script generates data. Does it generats 1000 UDP records at once and then sleep for a second or does it spread out the generation over time? Is it likely to flood the buffers and thereby cause data loss at the input stage?


#3

Thanks for your input...

Let me do further testing and I will get back to you...


#4

Hello,

Ive been doing some tests based on the feedback provided and can't seem to get more than 640 docs per second in ES!

Here the results…

Default, Pipeline workers = 4

[root@logstash conf.d]# /opt/logstash/bin/logstash -f /etc/logstash/conf.d/fileinput.conf
Settings: Default pipeline workers: 4
Logstash startup completed

During execution

Elasticsearch

[root@elasticsearch elasticsearch]# !top
top -H
top - 13:17:18 up 23 days,  2:40,  2 users,  load average: 4.03, 3.63, 3.22
Threads: 379 total,   1 running, 378 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.8 us,  0.2 sy,  0.0 ni, 57.0 id, 38.9 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem :  7840044 total,  2695332 free,  4195752 used,   948960 buff/cache
KiB Swap:  8126460 total,  8126460 free,        0 used.  3327752 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
18225 elastic+  20   0 8035868 3.416g  15816 D  2.7 45.7   4:16.20 java
18226 elastic+  20   0 8035868 3.416g  15816 D  2.3 45.7   4:16.98 java
18227 elastic+  20   0 8035868 3.416g  15816 S  2.3 45.7   3:24.33 java
18239 elastic+  20   0 8035868 3.416g  15816 D  2.3 45.7   4:13.17 java
18240 elastic+  20   0 8035868 3.416g  15816 D  2.3 45.7   4:12.65 java
18228 elastic+  20   0 8035868 3.416g  15816 S  1.7 45.7   3:26.93 java
18060 elastic+  20   0 8035868 3.416g  15816 S  0.3 45.7   0:47.95 java
18062 elastic+  20   0 8035868 3.416g  15816 S  0.3 45.7   0:47.90 java
...

Logstash

[root@logstash ]# top -H
top - 13:17:57 up 48 days, 18:09,  4 users,  load average: 0.07, 0.14, 0.15
Threads: 362 total,   1 running, 361 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.8 us,  0.2 sy,  0.0 ni, 94.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  7840044 total,  4663240 free,   892524 used,  2284280 buff/cache
KiB Swap:  8126460 total,  8123128 free,     3332 used.  6398128 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
10905 root      20   0 4632612 324440  14704 S  4.3  4.1   0:22.32 java
10919 root      20   0 4632612 324440  14704 S  2.3  4.1   0:12.07 [base]<file
10924 root      20   0 4632612 324440  14704 S  2.3  4.1   0:09.47 [base]>worker0
10926 root      20   0 4632612 324440  14704 S  2.3  4.1   0:09.81 [base]>worker2
10927 root      20   0 4632612 324440  14704 S  2.0  4.1   0:09.24 [base]>worker3
10925 root      20   0 4632612 324440  14704 S  1.3  4.1   0:09.56 [base]>worker1
10901 root      20   0 4632612 324440  14704 S  1.0  4.1   0:03.78 java
10902 root      20   0 4632612 324440  14704 S  1.0  4.1   0:03.81 java
10903 root      20   0 4632612 324440  14704 S  1.0  4.1   0:03.79 java
10920 root      20   0 4632612 324440  14704 S  0.3  4.1   0:00.21 [base]-pipeline
...

12 mins later

[root@elasticsearch elasticsearch]# curl -XGET 'http://192.168.9.95:9200/_cat/indices?v'
health status index                 pri rep docs.count docs.deleted store.size pri.store.size
    yellow open   logstash-2017.01.05     5   1     455715            0    247.5mb        247.5mb

455715 / 720 = 630 docs per second aprox
720 (12*60 seconds)


INCREASING PIPELINE WORKERS TO 8

[root@logstash conf.d]# /opt/logstash/bin/logstash -f /etc/logstash/conf.d/fileinput.conf -w 8
Settings: User set pipeline workers: 8, Default pipeline workers: 4
Logstash startup completed

12 mins later:

[root@elasticsearch elasticsearch]# curl -XGET 'http://192.168.9.95:9200/_cat/indices?v'
health status index                 pri rep docs.count docs.deleted store.size pri.store.size
yellow open   logstash-2017.01.05     5   1     435474            0    251.3mb        251.3mb

435474 / 720 = 605 docs per second aprox


INCREASING HEAP on LS to 4g

I increased it by modifying the file logstash.lib.sh

[root@logstash sysconfig]# vi /opt/logstash/bin/logstash.lib.sh

Checcked that the -Xmx was increased to 4g

[root@logstash sysconfig]# !ps
ps -ef | grep -i Xmx
root      9603 14998 99 12:52 pts/1    00:00:14 /usr/bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Xmx4g -Xss2048k -

12 mins later

[root@elasticsearch elasticsearch]# curl -XGET 'http://192.168.9.95:9200/_cat/indices?v'
health status index                 pri rep docs.count docs.deleted store.size pri.store.size
yellow open   logstash-2017.01.05     5   1     461982            0    260.5mb        260.5mb

461982 / 720 = 640 docs per second aprox


One last test

-Perl Script is executed, generating 8.060.000 logs in 20 mins, meaning that around 6400 logs were generated per second.

[root@logstash satnet]# time perl loggen.pl

real    20m45.322s
user    1m29.239s
sys     1m15.098s

-The Logstash output was set to a file:

output {
   file{
      path => "/home/satnet/logstash.log"
   }
}

and after the 20 mins, all 8.060.000 logs were written:

[root@logstash satnet]# cat logstash.log | wc -l
8060004

Seems that is not my friend...

It was capable of receiving 8.060.000 logs (at around 6400 logs per second apron), filter and output all of them to the file, it seems that the bottle neck is ES not LS.


(Christian Dahlqvist) #5

How many output workers did you test with for the Elasticsearch output? Try setting it to the same as the number of filter workers.

Have you followed the guidelines for indexing optimisation, especially increasing the refresh interval?


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.