Logstash silently stops processing events and does not respond to SIGTERM


(Robin P Blanchard) #1

Logstash happily plows along for an indeterminate amount of time (sometimes a couple days, sometimes a couple hours) and then nothing else is passed through to ES. At this point, the system is nearly idle (the only way I've yet to "monitor" for this sad symptom is to look at top/htop for an abnormally idle system and to query kibana/ES to find there is no new data). ES reports LS as being connected at this point. LS will not HUP or cleanly restart. An ungraceful kill (-9) and restart of LS is required. I've turned up LS logging to --verbose, but am still finding nothing telling in the logs. I'm glad to provide an strace of LS while it is "out to lunch" if that would be useful. Let me/us know what we can do to help further diagnose.

I've gone through my config and wrapped all my conditionals with an extra conditional (to verify the field exists before querying against its value) so as to avoid this (https://goo.gl/XDd4kH) possible problem...

logstash --version

logstash 1.5.0

cat /etc/redhat-release

CentOS Linux release 7.1.1503 (Core)

uname -r

3.10.0-229.1.2.el7.x86_64

java -version
java version "1.7.0_79"
OpenJDK Runtime Environment (rhel-2.5.5.1.el7_1-x86_64 u79-b14)
OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

resurrected from:
https://groups.google.com/forum/#!topic/logstash-users/wtJR2aNpqBs


(Dan) #2

You mentioned that CPU usage is nil (or close to nil). How is memory usage? What does the ps -ef|grep logstash output look like? Can you start a second jruby process (just a hello world)?

Also, have you checked the logstash issues list? https://github.com/elastic/logstash/issues

I haven't seen this before, just brainstorming.


(Robin P Blanchard) #3

Thanks for the reply...The symptom has manifested after barely one hour...I'd love to troubleshoot this while in this state....

ps -ef |grep logstash |grep java

logstash 1110 1 99 09:08 ? 13:27:08 java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.io.tmpdir=/var/lib/logstash -Xmx2g -Xss2048k -Djffi.boot.library.path=/opt/logstash/vendor/jruby/lib/jni -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.io.tmpdir=/var/lib/logstash -Xbootclasspath/a:/opt/logstash/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/opt/logstash/vendor/jruby -Djruby.lib=/opt/logstash/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main --1.9 /opt/logstash/lib/bootstrap/environment.rb logstash/runner.rb agent -f /etc/logstash/conf.d -l /var/log/logstash/logstash.log -w 6

type logstash_open_logs

logstash_open_logs is a function
logstash_open_logs ()
{
lsof -p $(ps ax |grep java |grep logstash |awk '{print $1}') | LC_ALL="C" LC_ALL="C" grep --color=auto --color=auto "syslog-ng" | awk '{print $NF}'
}

logstash_open_logs |wc -l

385

top -b -n1 |head

top - 12:25:48 up 62 days, 29 min, 10 users, load average: 0.01, 0.07, 0.33
Tasks: 232 total, 1 running, 231 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.0 us, 0.8 sy, 25.7 ni, 68.3 id, 0.1 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 8011104 total, 146784 free, 5940436 used, 1923884 buff/cache
KiB Swap: 1679356 total, 1346700 free, 332656 used. 1642880 avail Mem

ps -efT |grep logstash |grep java |wc -l

176


(Mark Walkom) #4

That's a pretty massive config file with a lot of comments!

I'd raise this on GH.


(Phil Hagen) #5

Seeing identicl behavior on a similar host setup (Centos+java+kernel versions all match). I'm processing via the file{} input. Was an issue logged on GH or any other solution identified? Happy to help troubleshoot.

I can get this to occur predictably - exactly 8010 records imported from my data set and LS stops dropping events to Elastic, doesn't respond to SIGTERM. (Same data set I sent you a while back, @warkolm, BTW.)

Edit: Found it. Link: https://github.com/elastic/logstash/issues/3276


(Phil Hagen) #6

sorry for self-reply... continuing research on this.
seems the problem is related to the geoip{} filter in my case. Specifically, when doing a lookup against an ASN database, LS hangs indefinitely. Still tracking down details, but may be related to https://github.com/logstash-plugins/logstash-filter-geoip/issues/25


(system) #7