Logstash speed - virtual vs physical environment

Hey,

we are using the ELK stack for some detecting/alerting in incident response cases. Some time ago a college created a python framework to automate the creation and configuration of the complete ELK stack + RabbitMQ within a VMWare ESXI environment. The script works fine so far.

Since the script uses "old" versions (ES 1.4) and we are not happy with the indexing speed I am thinking about to update all the components to a newer version to make the index faster. A part of this "improvement" is or might be new/additional hardware.

During my tests of each component I have found a "strange" circumstance with Logstash in our virtual environment. It looks like we are "loosing" a lot of performance due our virtual environment.

My setup
Hardware
First Logstash System, virtual:

  • ESXI 6 host (newest version) with 40 Cores (4x10x2.20GHz), 384 GB ram, 24x10k 900GB HDD in Raid10 (default case) or 1x Samsung SM863 960GB (test case).
  • Only one Logstash VM installed on the host. 16 Cores, 64 GB RAM.
    Second Logstash System, physical:
  • 4 Cores, 32 GB RAM, 4x7.2k 4TB HDD in Raid 10

Software
Ubuntu 14.04 LTS (newest version/updates), default installation
Logstash 2.2.4, default installation via repo
Testlogs: 44,405,227 logrows in 187 files, filesize ~18MB, 4 columns (Date,Time,Url,Servername)
very simple Logstash metric filter:

 input {
      file {
        path => "/root/logs/*"
        type => "lg224_es24"
        ignore_older => "0"
        start_position => "beginning"
        sincedb_path => "/dev/null"
      }
    }
    filter {
      if [type] == "lg224_es24" { 
      metrics {
        meter => "events"
        add_tag => "metric"
        }
      }
    }
    output {
      if "metric" in [tags] {
        stdout {
          codec => line {
            format => "rate: %{[events][rate_1m]}"
          }
        }
      }
    }

Results
First system (RAID10 and SSD are kinda equal):

# /opt/logstash/bin/logstash -f /etc/logstash/speedtest3.conf
Settings: Default pipeline workers: 16
Logstash startup completed
rate: 0.0
rate: 8684.2
rate: 9087.192141385285
rate: 9159.296716612702
rate: 9374.465454928883
rate: 9561.988051370114
rate: 9690.365694600969
rate: 9649.479151150666
rate: 9781.111698328365
rate: 9806.128865644157
rate: 9938.029286860687
rate: 10168.842729060667
rate: 10372.390241870278
rate: 10470.384587517554
rate: 10423.70774909221
rate: 10398.72100908067
rate: 10628.29580156215
rate: 10727.449058668883
rate: 10796.670681968226
rate: 10824.937325537152
rate: 10682.077696770191
rate: 10646.235391116312
rate: 10519.598905288813

Peak is at 12k.

Second system:

$ /opt/logstash/bin/logstash -f speedtest3.conf
Settings: Default pipeline workers: 4
Logstash startup completed
rate: 0.0
rate: 10867.2
rate: 11946.53643803584
rate: 13039.24653207971
rate: 14058.580578353722
rate: 14988.993295947235
rate: 15814.679170979603
rate: 16598.541408674624
rate: 17323.103608007415
rate: 17961.74855767535
rate: 18546.212008719103
rate: 19067.265607298563
rate: 19540.549453671785
rate: 19988.928426774728
rate: 20397.77903968827
rate: 20760.79506428221
rate: 21115.542400093032
rate: 21442.1815628836
rate: 21729.495437505026
rate: 21997.003204276654
rate: 22248.159432843207
rate: 22477.25141961804
rate: 22682.31739369109

Peak is at 26k.

I knew physical systems are "always" faster, but not twice like in this case. The attached Logstash results slowing down the whole stack.

Does any one of you have an idea if thats "normal" or what i might change accelerate the virtual environment?

Thank you
Andreas

Does any one of you have an idea if thats "normal" or what i might change accelerate the virtual environment?

Try to determine if it's read IO or CPU which is the case. Then fix
that.

BR,

  • Simon

Hey Simon,

i cannot identify any obvious bottleneck. While Logstash is running in screen all cores are used with about 10-20%, IO seems to be no problem, first result should be ignored, it shows the usage since the last boot:

iostat -x 2 5
Linux 4.4.0-31-generic (logstash-test) 	09/13/2016 	_x86_64_	(16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.44    0.01    0.19    0.03    0.00   98.34

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
fd0               0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00   40.00   40.00    0.00  40.00   0.00
sda               0.00     3.97    0.71    4.56    26.48   469.02   188.23     0.33   62.73    1.34   72.29   0.85   0.45

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.86    0.00    1.07    0.00    0.00   89.07

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
fd0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.46    0.00    0.91    0.00    0.00   90.63

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
fd0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.77    0.00    1.29    0.03    0.00   89.91

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
fd0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.37    0.00    1.35    0.00    0.00   89.28

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
fd0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

i have even started some benchmarks while Logstash was running:
random write:

fio --name=randwrite --ioengine=libaio --iodepth=16 --rw=randwrite --bs=4k --direct=1 --size=1G --numjobs=8 --runtime=240 --group_reporting
randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=16
...
randwrite: (groupid=0, jobs=8): err= 0: pid=15126: Tue Sep 13 14:10:19 2016
  write: io=8192.0MB, bw=59186KB/s, iops=14796, runt=141732msec
    ...
Run status group 0 (all jobs):
  WRITE: io=8192.0MB, aggrb=59186KB/s, minb=59186KB/s, maxb=59186KB/s, mint=141732msec, maxt=141732msec

Disk stats (read/write):
  sda: ios=0/2089885, merge=0/99531, ticks=0/16814716, in_queue=16833212, util=100.00%

The rate dropped to 10k during the benchmark.

random read:

fio --name=randread --ioengine=libaio --iodepth=16 --rw=randread --bs=4k --direct=1 --size=1G --numjobs=8 --runtime=240 --group_reporting
randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=16
...
randread: (groupid=0, jobs=8): err= 0: pid=15768: Tue Sep 13 14:15:54 2016
  read : io=8192.0MB, bw=98745KB/s, iops=24686, runt= 84952msec
   ...
Run status group 0 (all jobs):
   READ: io=8192.0MB, aggrb=98745KB/s, minb=98745KB/s, maxb=98745KB/s, mint=84952msec, maxt=84952msec

Disk stats (read/write):
  sda: ios=2087595/69, merge=9125/48, ticks=10331020/56, in_queue=10333780, util=99.98%

There was no real drop.

Br
Andreas

No idea what I'm looking at but keep measuring, you'll find something.

Someone else might chime in here too.

btw, do you have the same JVM heap settings?

yep,
virtual:
ps -ef | grep java
root 13523 8497 97 13:36 pts/1 01:39:07 /usr/bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Xmx1g -Xss2048k -Djffi.boot.library.path=/opt/logstash/vendor/jruby/lib/jni -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/logstash/heapdump.hprof -Xbootclasspath/a:/opt/logstash/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/opt/logstash/vendor/jruby -Djruby.lib=/opt/logstash/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main --1.9 /opt/logstash/lib/bootstrap/environment.rb logstash/runner.rb agent -f /etc/logstash/speedtest3.conf

ps -ef | grep java
core 13701 5974 99 15:31 pts/15 00:00:15 /usr/lib/jvm/java-7-oracle/bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Xmx1g -Xss2048k -Djffi.boot.library.path=/opt/logstash/vendor/jruby/lib/jni -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/logstash/heapdump.hprof -Xbootclasspath/a:/opt/logstash/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/opt/logstash/vendor/jruby -Djruby.lib=/opt/logstash/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main --1.9 /opt/logstash/lib/bootstrap/environment.rb logstash/runner.rb agent -f speedtest3.conf

I can't really comment on your issues as I'm not an infrastructure guy, but we run our entire ELK stack and hundreds of servers only as VMs and we get good performance, so it is possible.

Thanks for your reply Alex. I guess i am far away from identifying the real problem, but i have found a solution for my needs. I can increase the rate by adding additional input worker => i have to divide the logs to separate folder and define one input worker for each folder:

input {
  file {
    path => "/root/logs3/*"
    type => "lg224_es24"
    ignore_older => "0"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
  file {
    path => "/root/logs/*"
    type => "lg224_es24"
    ignore_older => "0"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
  file {
    path => "/root/logs2/*"
    type => "lg224_es24"
    ignore_older => "0"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}

with 4 folder the rate increased up to 30k/sec.
Is such behavior known for the input worker?
BR
Andreas

I can't comment on the input file plugin but maybe you can use input tcp
instead?

Maybe input file doesn't do files in parallel? If so you are limited to
how fast your OS can read one file. Maybe this is why it differs?

Are these logfiles just for testing or? If so, use:

 cat > netcat.sh <<EOF
 #!/bin/sh
 xargs -0 -P12 -n1 cat "$1" | nc localhost 514
 EOF
 chmod +x netcat.sh
 find /root/logs -type f -print0 | ./netcat.sh

Adjust the script to suit your environment. -P argument to xargs adjusts
how many continious streams there should be.

What creates the logs? Some other syslog daemons? If so, point them
toward logstash via input tcp.

Thanks, are you sure, that netcat be used in parallel with identical destination port? I got many parsing warnings. It looks like that logstash do mix different log rows.

Nevertheless, even with one process (file after file) the rate reaches up to 15k => and that is definitely faster then the file input.

We got the logs from the operational team via ssh(scp) to our delivery server.