Buffer Overflow when inputing old logs into logstash

dlb1001 · October 5, 2015, 12:30pm

Hello,

I've been working on inputing all of our old logs into logstash.

I've been using this thread to help me implement it:

https://discuss.elastic.co/t/read-old-logs-in-gz-format/24730

Right now I'm just inputing custom apache logs through a listening port.

I've been using netcat to output the logs to the listening port. Here is my config file for logstash that I'm using. For example like this: cat log | nc localhost 4500

input {
      tcp {   
              type => "apache"
                  port => 4500
                    }
}

filter {

   if [type] == "apache" {
   grok {
    match => { "message" => [
                          "%{URIHOST:ServerName} (?:%{URIHOST:OriginIP}|-) (?:%{USER:ident}|-) (?:%{USER:auth}|-) \[%{HTTPDATE:DateOfRequest}\] %{QS:RequestFirstLine} %{POSINT:HTTPStatus} (?:%{NUMBER:bytes}|-) %{QS:Referrer} %{QS:UserAgent}",
                  "%{URIHOST:ServerName} (?:%{URIHOST:OriginIP}|-), (?:%{URIHOST}|-) (?:%{USER:ident}|-) (?:%{USER:auth}|-) \[%{HTTPDATE:DateOfRequest}\] %{QS:RequestFirstLine} %{POSINT:HTTPStatus} (?:%{NUMBER:bytes}|-) %{QS:Referrer} %{QS:UserAgent}",
                  "%{URIHOST:ServerName} (?:%{URIHOST:OriginIP}|-), (?:%{URIHOST}|-), (?:%{URIHOST}|-) (?:%{USER:ident}|-) (?:%{USER:auth}|-) \[%{HTTPDATE:DateOfRequest}\] %{QS:RequestFirstLine} %{POSINT:HTTPStatus} (?:%{NUMBER:bytes}|-) %{QS:Referrer} %{QS:UserAgent}",
                   "%{URIHOST:ServerName} (?:%{URIHOST:OriginIP}|-), (?:%{URIHOST}|-) (?:%{USER:ident}|-) (?:%{USER:auth}|-) \[%{HTTPDATE:DateOfRequest}\] %{NOTSPACE}%{QS:RequestFirstLine} %{POSINT:HTTPStatus} (?:%{NUMBER:bytes}|-) %{NOTSPACE}%{QS:Referrer} %{NOTSPACE}%{QS:UserAgent}"
                             ]
              }
        }
        date {
              'match' => ["DateOfRequest", "dd/MMM/YYYY:HH:mm:ss Z"]
              locale => "en"
             }

                        }

     geoip {
      source => "OriginIP"
      target => "geoip"
      add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
      add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]
    }
    mutate {
      convert => [ "[geoip][coordinates]", "float"]
    }
}

output {
  elasticsearch { host => localhost }
  stdout { codec => rubydebug }
}

Also I've written a script to automate the process.

Everything seems to work fine, except for two problems, the input into logstash is being duplicated. For some reason I'm getting two entries in logstash for each 1 entry in the logs. Also for the 1st, 2nd, 3rd, and even sometimes for the 4th file. However eventually I get to the point where logstash freezes and locks up. Once it reaches this point the log stops inputing and I have to manually kill the logstash process and start it again before it will begin accepting logs. I'm not entirely sure how to troubleshoot this problem. When I get to the freezes I get this java exception in the logstash log. The exception is too big to put in this message. I'll put it in a follow up.

Any way to avoid this and fix the log entry repetition?

Thank you

dlb1001 · October 5, 2015, 12:30pm

The Java Exception:

Exception in thread "|worker" java.nio.BufferOverflowException
        at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:189)
        at org.jruby.util.io.ChannelStream.bufferedWrite(ChannelStream.java:1100)
        at org.jruby.util.io.ChannelStream.fwrite(ChannelStream.java:1277)
        at org.jruby.RubyIO.fwrite(RubyIO.java:1541)
        at org.jruby.RubyIO.write(RubyIO.java:1412)
        at org.jruby.RubyIO$INVOKER$i$1$0$write.call(RubyIO$INVOKER$i$1$0$write.gen)
        at org.jruby.RubyClass.finvoke(RubyClass.java:742)
        at org.jruby.runtime.Helpers.invoke(Helpers.java:503)
        at org.jruby.RubyBasicObject.callMethod(RubyBasicObject.java:363)
        at org.jruby.RubyIO.write(RubyIO.java:2490)
        at org.jruby.RubyIO.putsSingle(RubyIO.java:2478)
        at org.jruby.RubyIO.puts1(RubyIO.java:2407)
        at org.jruby.RubyIO.puts(RubyIO.java:2380)
        at org.jruby.RubyIO$INVOKER$i$puts.call(RubyIO$INVOKER$i$puts.gen)
        at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
        at rubyjit.Cabin::Outputs::IO$$\=\^\^_6269f884ef35189823c6682da4fcb5035fcb6e7233121026.block_0$RUBY$__file__(/opt/logstash/vendor/bundle/jruby/1.9/gems/cabin-0.7.1/lib/cabin/outputs/io.rb:52)
        at rubyjit$Cabin::Outputs::IO$$\=\^\^_6269f884ef35189823c6682da4fcb5035fcb6e7233121026$block_0$RUBY$__file__.call(rubyjit$Cabin::Outputs::IO$$\=\^\^_6269f884ef35189823c6682da4fcb5035fcb6e7233121026$block_0$RUBY$__file__)
        at org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:135)
        at org.jruby.runtime.Block.yield(Block.java:142)
        at org.jruby.ext.thread.Mutex.synchronize(Mutex.java:149)
        at org.jruby.ext.thread.Mutex$INVOKER$i$0$0$synchronize.call(Mutex$INVOKER$i$0$0$synchronize.gen)
        at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:143)
        at org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)
        at rubyjit.Cabin::Outputs::IO$$\=\^\^_6269f884ef35189823c6682da4fcb5035fcb6e7233121026.__file__(/opt/logstash/vendor/bundle/jruby/1.9/gems/cabin-0.7.1/lib/cabin/outputs/io.rb:50)
        at rubyjit.Cabin::Outputs::IO$$\=\^\^_6269f884ef35189823c6682da4fcb5035fcb6e7233121026.__file__(/opt/logstash/vendor/bundle/jruby/1.9/gems/cabin-0.7.1/lib/cabin/outputs/io.rb)
        at org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:181)
        at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
        at org.jruby.runtime.callsite.ShiftLeftCallSite.call(ShiftLeftCallSite.java:24)
        at rubyjit.Cabin::Channel$$publish_81fc94d65b7d4fcad95b02a7f5b748a45eb5041d33121026.block_2$RUBY$__file__(/opt/logstash/vendor/bundle/jruby/1.9/gems/cabin-0.7.1/lib/cabin/channel.rb:176)
        at rubyjit$Cabin::Channel$$publish_81fc94d65b7d4fcad95b02a7f5b748a45eb5041d33121026$block_2$RUBY$__file__.call(rubyjit$Cabin::Channel$$publish_81fc94d65b7d4fcad95b02a7f5b748a45eb5041d33121026$block_2$RUBY$__file__)
        at org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:135)
        at org.jruby.runtime.Block.yield(Block.java:142)
        at org.jruby.RubyHash$13.visit(RubyHash.java:1354)
        at org.jruby.RubyHash.visitLimited(RubyHash.java:648)
        at org.jruby.RubyHash.visitAll(RubyHash.java:634)
        at org.jruby.RubyHash.iteratorVisitAll(RubyHash.java:1305)
        at org.jruby.RubyHash.each_pairCommon(RubyHash.java:1350)
        at org.jruby.RubyHash.each19(RubyHash.java:1341)
        at org.jruby.RubyHash$INVOKER$i$0$0$each19.call(RubyHash$INVO

Christian_Dahlqvist · October 5, 2015, 12:36pm

Have you tried piping the data into the stdin input plugin?

dlb1001 · October 5, 2015, 12:42pm

I have not, how would that work with 40 or so logs? I'd want to run a script. Also I forgot to mention in my previous post, that I think for some reason there are are to entries in logstash of the same unique log entry. I'm not sure why this is happening.

Christian_Dahlqvist · October 5, 2015, 12:57pm

If you provide either a list of files or a pattern that matches the file names to be processed as argument to 'cat', you should be able to process the files sequentially through a single Logstash instance using the stdin input plugin.

dlb1001 · October 5, 2015, 1:23pm

Christian,

The pattern is not an issue, thats actually part of my script. Would I just run logstash -f configfile; logoutputscript.sh? I'm just trying to get the mechanics down. I have no issues with changing the config file for stdin, I know how to do that. I'm just not sure how I would output the log file to stdin while avoiding grok parse errors from extraneous output like the line I would use to run cat, or the line I would use to run my script.

Thanks

Christian_Dahlqvist · October 5, 2015, 2:19pm

When I process multiple files I generally just run it like this, using the stdin plugin:

cat file1.log file2.log file3.log | logstash -f config.conf

or

cat file*log | logstash -f config.conf

dlb1001 · October 5, 2015, 4:38pm

Christian,

That worked perfectly. Thank you

Topic		Replies	Views
Using Logstash to analyse log4j log files Logstash	6	5529	July 6, 2017
Logstash not reading from file input Logstash	17	10548	February 26, 2019
Logstash does not do anything with file input but starts successfully Logstash	5	482	April 9, 2019
Log4j input plugin Logstash	6	1600	October 10, 2017
Log4j input Logstash	4	905	July 6, 2017

Buffer Overflow when inputing old logs into logstash

Related topics