Buffer Overflow when inputing old logs into logstash

Hello,

I've been working on inputing all of our old logs into logstash.

I've been using this thread to help me implement it:

https://discuss.elastic.co/t/read-old-logs-in-gz-format/24730

Right now I'm just inputing custom apache logs through a listening port.

I've been using netcat to output the logs to the listening port. Here is my config file for logstash that I'm using. For example like this: cat log | nc localhost 4500

input {
      tcp {   
              type => "apache"
                  port => 4500
                    }
}

filter {

   if [type] == "apache" {
   grok {
    match => { "message" => [
                          "%{URIHOST:ServerName} (?:%{URIHOST:OriginIP}|-) (?:%{USER:ident}|-) (?:%{USER:auth}|-) \[%{HTTPDATE:DateOfRequest}\] %{QS:RequestFirstLine} %{POSINT:HTTPStatus} (?:%{NUMBER:bytes}|-) %{QS:Referrer} %{QS:UserAgent}",
                  "%{URIHOST:ServerName} (?:%{URIHOST:OriginIP}|-), (?:%{URIHOST}|-) (?:%{USER:ident}|-) (?:%{USER:auth}|-) \[%{HTTPDATE:DateOfRequest}\] %{QS:RequestFirstLine} %{POSINT:HTTPStatus} (?:%{NUMBER:bytes}|-) %{QS:Referrer} %{QS:UserAgent}",
                  "%{URIHOST:ServerName} (?:%{URIHOST:OriginIP}|-), (?:%{URIHOST}|-), (?:%{URIHOST}|-) (?:%{USER:ident}|-) (?:%{USER:auth}|-) \[%{HTTPDATE:DateOfRequest}\] %{QS:RequestFirstLine} %{POSINT:HTTPStatus} (?:%{NUMBER:bytes}|-) %{QS:Referrer} %{QS:UserAgent}",
                   "%{URIHOST:ServerName} (?:%{URIHOST:OriginIP}|-), (?:%{URIHOST}|-) (?:%{USER:ident}|-) (?:%{USER:auth}|-) \[%{HTTPDATE:DateOfRequest}\] %{NOTSPACE}%{QS:RequestFirstLine} %{POSINT:HTTPStatus} (?:%{NUMBER:bytes}|-) %{NOTSPACE}%{QS:Referrer} %{NOTSPACE}%{QS:UserAgent}"
                             ]
              }
        }
        date {
              'match' => ["DateOfRequest", "dd/MMM/YYYY:HH:mm:ss Z"]
              locale => "en"
             }

                        }

     geoip {
      source => "OriginIP"
      target => "geoip"
      add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
      add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]
    }
    mutate {
      convert => [ "[geoip][coordinates]", "float"]
    }
}

output {
  elasticsearch { host => localhost }
  stdout { codec => rubydebug }
}

Also I've written a script to automate the process.

Everything seems to work fine, except for two problems, the input into logstash is being duplicated. For some reason I'm getting two entries in logstash for each 1 entry in the logs. Also for the 1st, 2nd, 3rd, and even sometimes for the 4th file. However eventually I get to the point where logstash freezes and locks up. Once it reaches this point the log stops inputing and I have to manually kill the logstash process and start it again before it will begin accepting logs. I'm not entirely sure how to troubleshoot this problem. When I get to the freezes I get this java exception in the logstash log. The exception is too big to put in this message. I'll put it in a follow up.

Any way to avoid this and fix the log entry repetition?

Thank you

The Java Exception:

Exception in thread "|worker" java.nio.BufferOverflowException
        at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:189)
        at org.jruby.util.io.ChannelStream.bufferedWrite(ChannelStream.java:1100)
        at org.jruby.util.io.ChannelStream.fwrite(ChannelStream.java:1277)
        at org.jruby.RubyIO.fwrite(RubyIO.java:1541)
        at org.jruby.RubyIO.write(RubyIO.java:1412)
        at org.jruby.RubyIO$INVOKER$i$1$0$write.call(RubyIO$INVOKER$i$1$0$write.gen)
        at org.jruby.RubyClass.finvoke(RubyClass.java:742)
        at org.jruby.runtime.Helpers.invoke(Helpers.java:503)
        at org.jruby.RubyBasicObject.callMethod(RubyBasicObject.java:363)
        at org.jruby.RubyIO.write(RubyIO.java:2490)
        at org.jruby.RubyIO.putsSingle(RubyIO.java:2478)
        at org.jruby.RubyIO.puts1(RubyIO.java:2407)
        at org.jruby.RubyIO.puts(RubyIO.java:2380)
        at org.jruby.RubyIO$INVOKER$i$puts.call(RubyIO$INVOKER$i$puts.gen)
        at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
        at rubyjit.Cabin::Outputs::IO$$\=\^\^_6269f884ef35189823c6682da4fcb5035fcb6e7233121026.block_0$RUBY$__file__(/opt/logstash/vendor/bundle/jruby/1.9/gems/cabin-0.7.1/lib/cabin/outputs/io.rb:52)
        at rubyjit$Cabin::Outputs::IO$$\=\^\^_6269f884ef35189823c6682da4fcb5035fcb6e7233121026$block_0$RUBY$__file__.call(rubyjit$Cabin::Outputs::IO$$\=\^\^_6269f884ef35189823c6682da4fcb5035fcb6e7233121026$block_0$RUBY$__file__)
        at org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:135)
        at org.jruby.runtime.Block.yield(Block.java:142)
        at org.jruby.ext.thread.Mutex.synchronize(Mutex.java:149)
        at org.jruby.ext.thread.Mutex$INVOKER$i$0$0$synchronize.call(Mutex$INVOKER$i$0$0$synchronize.gen)
        at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:143)
        at org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)
        at rubyjit.Cabin::Outputs::IO$$\=\^\^_6269f884ef35189823c6682da4fcb5035fcb6e7233121026.__file__(/opt/logstash/vendor/bundle/jruby/1.9/gems/cabin-0.7.1/lib/cabin/outputs/io.rb:50)
        at rubyjit.Cabin::Outputs::IO$$\=\^\^_6269f884ef35189823c6682da4fcb5035fcb6e7233121026.__file__(/opt/logstash/vendor/bundle/jruby/1.9/gems/cabin-0.7.1/lib/cabin/outputs/io.rb)
        at org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:181)
        at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
        at org.jruby.runtime.callsite.ShiftLeftCallSite.call(ShiftLeftCallSite.java:24)
        at rubyjit.Cabin::Channel$$publish_81fc94d65b7d4fcad95b02a7f5b748a45eb5041d33121026.block_2$RUBY$__file__(/opt/logstash/vendor/bundle/jruby/1.9/gems/cabin-0.7.1/lib/cabin/channel.rb:176)
        at rubyjit$Cabin::Channel$$publish_81fc94d65b7d4fcad95b02a7f5b748a45eb5041d33121026$block_2$RUBY$__file__.call(rubyjit$Cabin::Channel$$publish_81fc94d65b7d4fcad95b02a7f5b748a45eb5041d33121026$block_2$RUBY$__file__)
        at org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:135)
        at org.jruby.runtime.Block.yield(Block.java:142)
        at org.jruby.RubyHash$13.visit(RubyHash.java:1354)
        at org.jruby.RubyHash.visitLimited(RubyHash.java:648)
        at org.jruby.RubyHash.visitAll(RubyHash.java:634)
        at org.jruby.RubyHash.iteratorVisitAll(RubyHash.java:1305)
        at org.jruby.RubyHash.each_pairCommon(RubyHash.java:1350)
        at org.jruby.RubyHash.each19(RubyHash.java:1341)
        at org.jruby.RubyHash$INVOKER$i$0$0$each19.call(RubyHash$INVO

Have you tried piping the data into the stdin input plugin?

I have not, how would that work with 40 or so logs? I'd want to run a script. Also I forgot to mention in my previous post, that I think for some reason there are are to entries in logstash of the same unique log entry. I'm not sure why this is happening.

If you provide either a list of files or a pattern that matches the file names to be processed as argument to 'cat', you should be able to process the files sequentially through a single Logstash instance using the stdin input plugin.

Christian,

The pattern is not an issue, thats actually part of my script. Would I just run logstash -f configfile; logoutputscript.sh? I'm just trying to get the mechanics down. I have no issues with changing the config file for stdin, I know how to do that. I'm just not sure how I would output the log file to stdin while avoiding grok parse errors from extraneous output like the line I would use to run cat, or the line I would use to run my script.

Thanks

When I process multiple files I generally just run it like this, using the stdin plugin:

cat file1.log file2.log file3.log | logstash -f config.conf

or

cat file*log | logstash -f config.conf
1 Like

Christian,

That worked perfectly. Thank you