Logstash builds Recv-Q for minute long pauses

Hello,

We are sending syslog data over the wire using stunnel to encrypt, and when it gets to the logstash server (on Ubuntu 14.04), we periodically build up a big Recv-Q as the data moves out of the server-side stunnel, and into Logstash's TCP 5000 port.

During this time, the JVM is not logging any garbage collection. In fact, "jstat -gccause" shows numbers remaining static through the duration of this "lockup". Once the Recv-Q drains into logstash and things get moving, we see normal YGC activity which is frequent but keeps things moving along just fine for us.

I wish it was pausing for GC but I can't find any evidence it is, since it seems blocking on the TCP socket coming into logstash.

It appears we run logstash v 1.5.4 using Oracle Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

/usr/bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -d64 -Dfile.encoding=utf-8 -Dsun.jnu.encoding=utf-8 -XX:PermSize=128m -XX:MaxPermSize=128m -javaagent:/usr/lib/jvm/java-7-oracle/jre/lib/jolokia-jvm-1.2.3-agent.jar=host=localhost,port=8779,policyLocation=file:///usr/lib/jvm/java-7-oracle/jre/lib/jolokia-access.xml -XX:+UseCompressedOops -XX:+AlwaysPreTouch -XX:+ParallelRefProcEnabled -Djava.io.tmpdir=/tmp/logstash -Djava.security.properties=/etc/logstash/java.security -Xmx4096m -Xss2048k -Djffi.boot.library.path=/opt/logstash/vendor/jruby/lib/jni -Xbootclasspath/a:/opt/logstash/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/opt/logstash/vendor/jruby -Djruby.lib=/opt/logstash/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main --1.9 /opt/logstash/lib/bootstrap/environment.rb logstash/runner.rb agent -f /etc/logstash/conf.d -l /srv/log/logstash/logstash.log

Any pointers?

What's your config look like?

Here is the input portion:

input {
tcp {
mode => "server"
host => "127.0.0.1"
port => 5000
codec => "json"
}
}

after that it goes through a few filters then it's off to Graylog:

output {
gelf {
host => "127.0.0.1"
port => 12201
}
}

Are you sure it's not graylog, ie have you changed the output to be a file/stdout and watched the flow?

The GELF output is UDP. I sniff 127.0.0.1 and see that when Graylog stops receiving messages, nothing is on the line. Blasting UDP shouldn't block, should it?

You're right that blasting TCP shouldn't block. Have you tried checking out the threads in VisualVM ,to see which threads are live and which are idle? A screenshot of that would be very useful