Logstash going OOM - help confirming issue

David_McClain · August 1, 2016, 3:47pm

I've recently begun having problems where logstash continues to go out of memory a few seconds (or maybe a minute) after startup. I have an idea of what may be happening, but I'm hoping that some of the more seasoned pro's out here might be able to steer me away from the wrong path, or confirm what my suspicions are.

I have recently added some new Groks, so I am reasonably confident that this is related to filtering (not input or output).

I downloaded VisualVM to analyze the heapdump, and the 'classes' breakdown shows that the Class Name: 'org.joni.StackEntry' has 25.8 million instances (which is 95% of the total instances) and the Size is 931M (71.7% of the size)

I looked up that class, and it seems related to the Regex engine, and my thinking is that I've done something to cause Catastrophic Backtracking (an insanely cool word for something very not cool to happen in Production).
So basically,. I think I've got some regex's that are too many lookaheads (maybe?)

I will put my filter block in a comment (too many characters)

David_McClain · August 1, 2016, 3:49pm

Indexer file: Part 1 (Whole thing is too large for 1 comment):

input {
  redis {
    host => ["REDACTED"]
    port => 6379
    key => "logstash"
    data_type => "list"
  }
}

filter {
  ruby {
    code => "event['logfilename'] = File.basename(event['path'],File.extname(event['path']))"
  }

  if ( "apache2_access.log" in [tags] ) {
    grok {
      match => { "message" => "%{IPORHOST:apache_host} %{COMMONAPACHELOG} - %{QS:referrer} %{QS:unknown} %{QS:agent}" }
      match => { "message" => "%{COMMONAPACHELOG}" }
      match => { "message" => "(?<NOMATCH>(.|\r|\n)*)" }
    }
    mutate {
      gsub => [
        "referrer", "\"", ""
      ]
    }
    mutate {
      gsub => [
        "unknown", "\"", ""
      ]
    }
    mutate {
      gsub => [
        "agent", "\"", ""
      ]
    }
  }

  if ( "apache2_rewrite.log" in [tags] ) {
    grok {
      match => { "message" => "%{IPORHOST:remote_host} %{NOTSPACE:remote_logfile} %{NOTSPACE:remote_user} \[%{HTTPDATE}\] \[%{IPORHOST:apache_host}\/sid#(?<sid>(?:((?>(?!\]))).)*)\]\[rid#(?<rid>(?:((?>(?!\/))).)*)\/(?<rid_type>(?:((?>(?!\]))).)*)\] \(%{NUMBER:count}\) (?<msg>(.|\r|\n)*)" }
      match => { "message" => "(?<NOMATCH>(.|\r|\n)*)" }
    }
  }

  if ( ("REDACTED.log" in [tags]) or ("REDACTED.log" in [tags]) or ("REDACTED.log" in [tags]) or ("REDACTED.log" in [tags]) or ("REDACTED.log" in [tags]) or ("REDACTED.log" in [tags]) or ("REDACTED.log" in [tags]) or ("REDACTED.log" in [tags]) or ("REDACTED.log" in [tags]) or ("REDACTED.log" in [tags]) ){ 
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:LogTimestamp}%{SPACE}\[(?<ExecRequest>[^\]]*)]%{SPACE}%{LOGLEVEL:LogLevel}%{SPACE}(?<JavaClass>[0-9A-Za-z\._]+)-\[(?<Block1>(?:((?>(?!\]-))).)*)\]-\[(?<Block2>(?:((?>(?!\]-))).)*)\]-(?<MainMessage>(.|\r|\n)*)" }
      #This is a fallout match. If a log message doesn't match the abover pattern, it will fall into the nomatch bucket
      # This will be helpful to identify any pattern matching work that needs to be done in the future
      match => { "message" => "(?<NOMATCH>(.|\r|\n)*)" }
    }
    date {
      match => [ "LogTimestamp", "ISO8601" ]
      target => "@logtimestamp"
    }
    if [MainMessage] =~ "\A{(.|\r|\n)*" {
      ruby {
        code => "event['json'] = event['MainMessage']"
      }
    }
    if ( "" in [json] ) {
      json {
        source => "json"
      }
    }

    #Removing fields no longer needed
    mutate { remove_field => ["json"] }
    
    #Removing REDACTED field
    mutate { remove_field => [ "REDACTED" ] }
    mutate {
      gsub => [
        "message", "\"REDACTED\":\"(?:((?>(?!\",))).)*\"", "\"REDACTED\":\"****\"",
        "MainMessage",  "\"REDACTED\":\"(?:((?>(?!\",))).)*\"", "\"REDACTED\":\"****\""
      ]
    }
    
    if ( [logfilename] in ["REDACTED","REDACTED"] ) {
      mutate {
        rename => { "Block1" => "SesnId"}
        rename => { "Block2" => "ExpId"}
      }
    }
    if ( [logfilename] in ["REDACTED"] ) {
      mutate {
        rename => { "Block1" => "TranId" }
        rename => { "Block2" => "SesnId" }
      }
    }

    #GEO IP
    if [RemoteAddress] =~ /(?:((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?|(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9]))/ {
      geoip {
        source => "RemoteAddress"
        database => "/etc/logstash/GeoLiteCity.dat"
      }
    }
  }

David_McClain · August 1, 2016, 3:52pm

Indexer file part 2:

  #REDACTED GROUP#
  if ( "REDACTED" in [tags] ) {
    #REDACTED#
    if ( "REDACTED.log" in [tags] ) {
      grok {
        match => { "message" => "\[%{TIMESTAMP_ISO8601:@logtimestamp}\]%{SPACE}%{LOGLEVEL:loglevel}%{SPACE}\[(?<Block1>(?:((?>(?!\]-))).)*)\]\[(?<codefile>(?:((?>(?!:))).)*):(?<linenum>(?:((?>(?!\]-))).)*)\]%{SPACE}(?<logmessage>(.|\r|\n)*)" }
        match => { "message" => "(?<NOMATCH>(.|\r|\n)*)" }
      }
      date {
        match => ["@logtimestamp", "ISO8601" ]
        target => "@logtimestamp"
      }
      if [logmessage] =~ "\A{(.|\r|\n)*" {
        mutate {
          gsub => [
            "logmessage", "=", ":"
          ]
        }
        mutate {
          gsub => [
            'logmessage', '([{ ])([a-z]+):', '\1"\2":'
          ]
        }
        mutate {
          gsub => [
            'logmessage', '(:)(?<!")((?:\b|/)\S+\b)(?!")', '\1"\2"'
          ]
        }
        json { source => "logmessage"  }
      }
    }
    #/REDACTED#
    
    
    #REDACTED#
    if ( "REDACTED.txt" in [tags] ) {
      grok {
        match => { "message" => "%{IPORHOST:clientip}%{SPACE}%{HTTPDUSER:ident}%{SPACE}%{USER:auth}%{SPACE}\[%{HTTPDATE:@logtimestamp}\]%{SPACE}\"(?:%{WORD:verb}%{SPACE}%{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\"%{SPACE}%{NUMBER:response}%{SPACE}(?:%{NUMBER:bytes}|-)" }
        match => { "message" => "%{IPORHOST:clientip}%{SPACE}%{QUOTEDSTRING:x-forwarded-for}%{SPACE}%{QUOTEDSTRING:set-cookie}%{SPACE}%{QUOTEDSTRING:jsessionid}%{SPACE}%{HTTPDUSER:ident}%{SPACE}%{USER:auth}%{SPACE}\[%{HTTPDATE:@logtimestamp}\]%{SPACE}\"(?:%{WORD:verb}%{SPACE}%{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\"%{SPACE}%{NUMBER:response}%{SPACE}(?:%{NUMBER:bytes}|-)" }
        match => { "message" => "(?<NOMATCH>(.|\r|\n)*)" }
      }
      date {
        match => ["@logtimestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
        target => "@logtimestamp"
      }
      if [clientip] =~ /(?:((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?|(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9]))/ {
        geoip {
          source => "clientip"
          database => "/etc/logstash/GeoLiteCity.dat"
        }
      }
      if [bytes] == "-" {
        mutate {
          gsub => [
            "bytes", "-", "0"
          ]
        }
      }
    }
    #/REDACTED#
    
    #REDACTED#
    if ( "REDACTED.log" in [tags] ) {
      grok {
        match => { "message" => "\[%{TIMESTAMP_ISO8601:@logtimestamp}\]%{SPACE}%{WORD:loglevel}%{SPACE}\[%{NOTSPACE:logger}%{SPACE}(?<method>(?:((?>(?!\]))).)*)\]%{SPACE}(?<logmessage>(.|\r|\n)*)" }
        match => { "message" => "(?<NOMATCH>(.|\r|\n)*)" }
      }    
      date {
        match => ["@logtimestamp", "ISO8601" ]
        target => "@logtimestamp"
      }
    }
    #/REDACTED#
  }
  #/REDACTED GROUP#
}

output {
  elasticsearch {
    hosts => ["REDACTED"]
    index => "logstash-%{+YYYY.MM.dd}"
  }
}

warkolm · August 2, 2016, 10:27am

gsub is flat out regexp - Mutate filter plugin | Logstash Reference [8.11] | Elastic - so it's going to be expensive.
The ruby code looks ok but it is also relatively expensive.

What is that all about?

Why not just use Mutate filter plugin | Logstash Reference [8.11] | Elastic

Also, just do;

grok {
  match => { "message" => "PATTERN1", "PATTERN2" }
}

David_McClain · August 2, 2016, 10:56am

I wrote the
if [ip] =~ <ip regex>
code a while back. It's an attempt to only attempt running the geo filter on fields that have IPs only when they have valid IPs. I just took the regex from the %{IP} block as they were.
I was having cases where the fields in question would have both valid IPs, OR they would have something like 'unknown'. It was just a proactive attempt to not perform geo filtering when it didn't need it.

Regarding the mutate
I could use mutate replace. I am curious what the differences are. Does gsub by itself bring that much overhead? What does the mutate { replace {} } do that makes it cheaper? (Questions for my learning)

For PATTERN1, PATTERN2
I can make that change as well. Similar questions though, is there much performance overhead with having it in multiple match blocks?

--

The OutOfMemory problem is fairly recent, I believe I started noticing it after adding in the last three 'sections'. I was thinking maybe it had something to do with all of the %{QUOTEDSTRING}s, %{NOTSPACE}s, and negative lookaheads I added in. Is your thinking that this is more related to the overuse of gsub, and the overhead it brings along with it (regardless of what matching is happening?)

And, thank you very much for looking over my stuff and providing feedback. I'll be implementing some, if not all, of those tweaks in the next day or two.

warkolm · August 2, 2016, 11:00am

Understood, it's just expensive.

It's a straight (aka "dumb") replace, rather than a regexp check and then replace.

Nah, just cleaner
You could also build your own patterns file and move things there too.

Generally OOM occurs if there is too much in heap, obviously. Why could be many things such as a super big message that comes in, or maybe a regexp having to take up that last bit of memory.

I am not a super awesome logstash person like others, but just pointing out what I can in the hopes that general improvements may help.

How much heap does LS have?

David_McClain · August 2, 2016, 11:43am

How much heap does LS have?

1G it seems. I tried setting the env variable 'LS_HEAP_SIZE=2048m' butit doesn't seem to recognize it.
I included that in my init script so it's definitely a part of its environment when the LS process starts. I'm on version 2.1 currently, and I'm in the process of rolling out 2.3 up my environment chain.

Igor_Motov · August 2, 2016, 3:42pm

Could you run ps -aef | grep logstash and post here the result? It should show all switches that got supplied to logstash. With a proper setting you should see something like -Xmx2048m -Xss2048k and these switches should be present only once.

David_McClain · August 2, 2016, 3:55pm

Sure

(15:52:14) PROD ->ps -aef | grep logstash
root     54522     1 99 15:52 pts/0    00:00:17 /usr/bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/logstash/current/heapdump.hprof -Xmx1g -Xss2048k -Djffi.boot.library.path=/opt/logstash/logstash-2.1.0/vendor/jruby/lib/jni -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/logstash/current/heapdump.hprof -Xbootclasspath/a:/opt/logstash/logstash-2.1.0/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/opt/logstash/logstash-2.1.0/vendor/jruby -Djruby.lib=/opt/logstash/logstash-2.1.0/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main --1.9 /opt/logstash/current/lib/bootstrap/environment.rb logstash/runner.rb agent -f /etc/logstash/indexer1.conf &

(Plain text version)
(15:52:14) PROD ->ps -aef | grep logstash
root 54522 1 99 15:52 pts/0 00:00:17 /usr/bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/logstash/current/heapdump.hprof -Xmx1g -Xss2048k -Djffi.boot.library.path=/opt/logstash/logstash-2.1.0/vendor/jruby/lib/jni -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/logstash/current/heapdump.hprof -Xbootclasspath/a:/opt/logstash/logstash-2.1.0/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/opt/logstash/logstash-2.1.0/vendor/jruby -Djruby.lib=/opt/logstash/logstash-2.1.0/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main --1.9 /opt/logstash/current/lib/bootstrap/environment.rb logstash/runner.rb agent -f /etc/logstash/indexer1.conf &

David_McClain · August 2, 2016, 4:08pm

If it's helpful.. Here's the latest log output after the process went OOM again.

java.lang.OutOfMemoryError: Java heap space
Dumping heap to /opt/logstash/current/heapdump.hprof ...
Heap dump file created [1540367433 bytes in 27.384 secs]
{:timestamp=>"2016-08-02T15:58:32.002000+0000", :message=>"Exception in filterworker, the pipeline stopped processing new events, please check your filter configuration and restart Logstash.", "exception"=>java.lang.OutOfMemoryError: Java heap space, "backtrace"=>["org.joni.StackMachine.ensure1(StackMachine.java:98)", "org.joni.StackMachine.push(StackMachine.java:162)", "org.joni.StackMachine.pushAlt(StackMachine.java:200)", "org.joni.ByteCodeMachine.opPush(ByteCodeMachine.java:1517)", "org.joni.ByteCodeMachine.matchAt(ByteCodeMachine.java:272)", "org.joni.Matcher.matchCheck(Matcher.java:304)", "org.joni.Matcher.searchInterruptible(Matcher.java:457)", "org.jruby.RubyRegexp$SearchMatchTask.run(RubyRegexp.java:273)", "org.jruby.RubyThread.executeBlockingTask(RubyThread.java:1066)", "org.jruby.RubyRegexp.matcherSearch(RubyRegexp.java:235)", "org.jruby.RubyRegexp.search19(RubyRegexp.java:1780)", "org.jruby.RubyRegexp.matchPos(RubyRegexp.java:1720)", "org.jruby.RubyRegexp.match19Common(RubyRegexp.java:1701)", "org.jruby.RubyRegexp.match_m19(RubyRegexp.java:1680)", "org.jruby.RubyRegexp$INVOKER$i$match_m19.call(RubyRegexp$INVOKER$i$match_m19.gen)", "org.jruby.internal.runtime.methods.JavaMethod$JavaMethodOneOrNBlock.call(JavaMethod.java:350)", "org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)", "rubyjit.Grok$$match_and_capture_5fff680283e4b4ebc4c0eb1732d88dbb187668931442407170.file(/opt/logstash/current/vendor/bundle/jruby/1.9/gems/jls-grok-0.11.2/lib/grok-pure.rb:177)", "rubyjit.Grok$$match_and_capture_5fff680283e4b4ebc4c0eb1732d88dbb187668931442407170.file(/opt/logstash/current/vendor/bundle/jruby/1.9/gems/jls-grok-0.11.2/lib/grok-pure.rb)", "org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:201)", "org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:177)", "org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:188)", "rubyjit.LogStash::Filters::Grok$$match_against_groks_f366a5df68c47d69f27e877618fa5dabc744d0fd1442407170.block_0$RUBY$file(/opt/logstash/current/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-2.0.2/lib/logstash/filters/grok.rb:327)", "rubyjit$LogStash::Filters::Grok$$match_against_groks_f366a5df68c47d69f27e877618fa5dabc744d0fd1442407170$block_0$RUBY$file.call(rubyjit$LogStash::Filters::Grok$$match_against_groks_f366a5df68c47d69f27e877618fa5dabc744d0fd1442407170$block_0$RUBY$file)", "org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:135)", "org.jruby.runtime.Block.yield(Block.java:142)", "org.jruby.RubyArray.eachCommon(RubyArray.java:1606)", "org.jruby.RubyArray.each(RubyArray.java:1613)", "org.jruby.RubyArray$INVOKER$i$0$0$each.call(RubyArray$INVOKER$i$0$0$each.gen)", "org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:143)", "org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)", "rubyjit.LogStash::Filters::Grok$$match_against_groks_f366a5df68c47d69f27e877618fa5dabc744d0fd1442407170.file(/opt/logstash/current/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-2.0.2/lib/logstash/filters/grok.rb:325)"], :level=>:error}
Error: Your application used more memory than the safety cap of 1G.
Specify -J-Xmx####m to increase it (#### = cap size in MB).
Specify -w for full OutOfMemoryError stack trace

Igor_Motov · August 2, 2016, 4:09pm

yeah, that didn't work, can you show where exactly you tried to set this env variable?

David_McClain · August 2, 2016, 4:13pm

I have an init script that I use to start logstash as a service.

In the init script I have this line:

LS_HEAP_SIZE=2048m

Head of the init.d file:

#!/bin/bash
#
# Logstash Indexer init file for starting up the Logstash Indexer daemon
#
# chkconfig:   - 20 80
# description: Starts and stops the  Logstash Indexer daemon.
PATH=/sbin:/usr/sbin:/bin:/usr/bin
DESC="Logstash Indexer "
NAME=logstash
DAEMON=/opt/logstash/current/bin/$NAME
DAEMON_ARGS="-f /etc/logstash/indexer1.conf &"
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
LOG=/opt/logstash/current/bin/logstash_init.log

LS_HEAP_SIZE=2048m

Igor_Motov · August 2, 2016, 8:09pm

Yes, you set it in the init.d environment but depending on how you start logstash after that it might or might not make it to the logstash process itself.

David_McClain · August 2, 2016, 8:10pm

I start logstash with a service command. It should take the vars into account. I use that for setting the sincedb_path in other cases.

Igor_Motov · August 3, 2016, 7:08pm

I am not that good with shell script to be able to debug it by just looking at pieces of it, but I am pretty sure the issue is with the variable somehow not getting through.

David_McClain · August 4, 2016, 1:24pm

So some new info to bring to the table.

I was able to figure out why my env variable wasn't being set. I went and compared to other cases where I was setting vars in the init script (and they were working), and I found the difference. I wasn't 'export'ing the variable for memory correctly.

I changed the line to:

export LS_HEAP_SIZE=2048m

And it correctly reflected in the new process when I started it up again.
However, the process soon began filling up on memory again. The process did not crash because of the memory use, and is still running from yesterday, but 'top' shows a RES of 2.5G. I wouldn't normally care, since the process isn't crashing, but I checked another logstash process, on a different server (running a higher volume of data, and has been active for MUCH longer), and that process only has a RES of 435m.
it seems that the higher memory usage loosely correlates (by my own accounts, not thorough rigorous testing) with Redis being backlogged. In most of the cases of this that I've had recently, I will find that a system isn't indexing data, and upon investigation will see that the logstash indexer process has crashed. I originally thought that this crash was the root cause of redis being backlogged (as there's nothing to take stuff off of redis now), but now I'm wondering if I'm getting large volume spikes which flood Redis temporarily, and cause Logstash to crash, and then Redis continues filling up, compounding the problem. All that is speculation though, and I'm not completely sure how to test it.

David_McClain · August 4, 2016, 5:03pm

A better update.
I think I found a better correlation.
It seems that occasionally I will get large debug logs flow into my system. When logstash tried to collect them, something about the large logs are causing a spike in memory.

I'm running some tests now, along with finding the huge log messages (it's all relative, they're only about 275k bytes per log line), I'm also finding that some of these huge messages have badly formed JSON, and are breaking the JSON filter. I'm going to test if the plugin failing to parse the malformed JSON coincides with Logstash not giving back memory.

David_McClain · August 4, 2016, 8:45pm

So far, none of the testing I've done shows any linkage between the malformed JSON and the high memory usage.
I have discovered however that Logstash grows its heap somewhat relative to the size of the largest individual document it receives, and also somewhat relative to the number of those documents that are in the file. (My testing was done on a file with these log messages in them, not directly against a Redis list).

The part that confuses me though.. It seems like Logstash never seems to give BACK memory.

Some of my tests:
Notes:
both long and short log messages are parsed with the same Grok matcher
Short log size: around 100-150 bytes per line.
Long log size: around 275,000 bytes (275k)
Logstash heap size was 3G
I tested using a correctly formed JSON block in the long log messages unless otherwise specified

100 lines of a short log
Logstash heap grew to about 250M and stayed there.
10,000 lines of the short log
Logstash heap grew to about 275M and stayed there.
500 lines of the short log, followed by 1 long log, followed by 500 lines of the short log
Logstash heap grew to about 275M and stayed there
500 lines of the short log, followed by 50 liners of the long log, followed by 500 lines of the short log
Logstash heap grew to about 1.7G and stayed there.
This test was an attempt to see if the Logstash heap would begin to contract once the larger log sizes were flushed through, and smaller logs were being parsed again. Did not happen.
500 lines of the short log, followed by 500 lines of the long log, followed by 500 lines of the short log
Logstash heap grew to 2.2G and stayed there.
This test was an attempt to overflow the heap with memory.
2000 lines of the long log
Logstash heap grew to 2.2G and stayed there.
I'm not sure why 2.2G is the magic number above which the heap will not go in my testing. But, for the testing, I trimmed out all other filtering except for ONLY the Grok match that was specific to these log formats. In reality my indexer file has about 5 different types of file formats it handles, along with replacements, json filters, geoip filters, and more. So loading those may be what grows the heap higher in the regular systems.

I will probably start doing more testing to see specifically how each piece of the indexer filtering affects the memory usage.

However, is it normal for Logstash to not give back memory once it is no longer needed?

Igor_Motov · August 7, 2016, 3:06am

It depends on how you measure this memory. If you look at it as memory used by the java process, then yes, java will never give this memory back to OS. But if you will take a look at heap used using Visual VM for example, you should see used heap going up and down.

David_McClain · August 7, 2016, 2:33pm

Based on the testing I'm doing, it really just seems like it's a combination of:

Size of log messages (large messages)
Amount of large log messages
amount of filtering logic in place

that lead to the 3+ gig heap usage.
When I stripped out most of the filtering, memory seemed stable, it was just that I needed a lot more of it than I was giving.

I think this answers my question, but I'll probably continue doing testing on each piece of the filtering logic to get a better idea of how each one affects memory usage.

Thanks for your help guys.. I just need to give more memory to logstash

Topic		Replies	Views
Help Needed with Logstash 1.5 OOM error Logstash	12	1549	July 6, 2017
Logstash Out of memory Logstash	23	13682	July 6, 2017
Logstash Memory leak? Logstash	22	14677	July 6, 2017
Exception in thread "LogStash::Runner" org.jruby.exceptions.RaiseException: (TimeoutError) watchdog timeout Elasticsearch	4	1019	July 6, 2017
Error: Your application used more memory than the safety cap of 4G Logstash	7	2600	February 23, 2018

Logstash going OOM - help confirming issue

Related topics