I'm having an issue with Logstash 2.2.2 with Oracle Java 1.8 just stopping and can't figure out what the issue is.
There are multiple logstash processes running (one for windows, and one for syslog) listening on different ports that then send data to a logstash instance on boxes running Elasticsearch via the lumberjack plugin.
The architecture looks like this.
Linux Server --syslog--> Linux-LS-process --using lumberjack--> LS --> ES
Windows Server --nxlog sending json--> Windows-LS-process --using lumberjack--> LS --> ES
The issue appears to be limited to the Windows-LS-process which stops sending eventlogs for an unknown reason while the Linux-LS-process continues to forward syslog messages to ES.
It appears to be a throughput issue, leading to memory usage and the blockage, but I don't have good enough telemetry to see. I'm looking for a tool that would allow me to get a count of messages in the queue as well as number of messages processed per second at each of the LS stages (input, filter, output) but haven't found one yet. Are there any tools that can help identify and solve this issue? CPU utilization appears low on all systems, so it's not a hardware resource unless the processes just aren't taking advantage of what's available.
ES is definitely contributing to the backlog, but there is something wrong that's affecting only the logstash process handling windows eventlog since I have other logstash processes running on the same box for syslog going to the same ES cluster that continue to send messages.
I do sometimes see messages like this for the windows logstash process:
{:timestamp=>"2016-08-26T17:28:19.598000+0000", :message=>"All hosts unavailable, sleeping", :hosts=>["X.X.X.X"], :e=>#<RuntimeError: Could not connect to any hosts>, :backtrace=>["/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-lumberjack-0.0.26/lib/lumberjack/client.rb:31:in connect'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-lumberjack-0.0.26/lib/lumberjack/client.rb:24:ininitialize'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-lumberjack-2.0.4/lib/logstash/outputs/lumberjack.rb:93:in connect'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-lumberjack-2.0.4/lib/logstash/outputs/lumberjack.rb:72:inflush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-lumberjack-2.0.4/lib/logstash/outputs/lumberjack.rb:68:in flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:219:inbuffer_flush'", "org/jruby/RubyHash.java:1342:in each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:216:inbuffer_flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:193:in buffer_flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:112:inbuffer_initialize'", "org/jruby/RubyKernel.java:1479:in loop'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:110:inbuffer_initialize'"], :level=>:error}
{:timestamp=>"2016-08-26T17:28:19.657000+0000", :message=>"A plugin had an unrecoverable error. Will restart this plugin.\n Plugin: <LogStash::Inputs::Tcp port=>5001, type=>"eventlog", codec=><LogStash::Codecs::JSON charset=>"CP1252">, host=>"0.0.0.0", data_timeout=>-1, mode=>"server", ssl_enable=>false, ssl_verify=>true, ssl_key_passphrase=>>\n Error: closed stream", :level=>:error}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.