Worker thread consumes 100 % CPU (for no reason)

Hi,

I've recently set up a full ELK-stack. Things are not playing nice though. I'm experiencing the Logstash worker thread is constantly consuming 100 % of one of my CPU cores, even when there are nothing to ship or parse. Using multiple worker threads just loads multiple cores at 100 %.

My config is fairly simple and consists of two files at the moment.

injector.conf:

#Injector
input {
  file {
    type => "application1"
    path => "/var/log/application1/debug.log"
    codec => multiline {
      pattern => "%{TIMESTAMP_ISO8601}"
      negate => true
      what => "previous"
    }
  }

  file {
    type => "application1"
    path => "/var/log/application1/error.log"
    codec => multiline {
      pattern => "%{TIMESTAMP_ISO8601}"
      negate => true
      what => "previous"
    }
  }
}

output {
  if [type] == "application1" {
    rabbitmq {
      exchange => "logs-test"
      host => "127.0.0.1"
      exchange_type => "topic"
      key => "logs.%{[type]}.%{host}"
    }
  }
}

parser.conf:

#Parser
input {
  rabbitmq {
    queue => "logs-test"
    key => "#"
    host => "127.0.0.1"
    exchange => "logs-test"
  }
}

filter {
  if [type] == "application1" {
    grok {
      add_field => {
        "environment" => "test"
      }
      match => {
        "message" => "(?m)(?<call-id>\w+@(\w+\.?)+)?%{SPACE}?(?<fs-timestamp>20(\d{2}\-?){3}\s(\d{2}[:\.\b]){3}\d*) \[%{LOGLEVEL:loglevel}\] (?<function>\b\w+\.\w:\d*\b) %{GREEDYDATA:message}"
      }
      overwrite => [ "message" ]
    }

    mutate {
      lowercase => [ "loglevel" ]
      gsub => [ "message", "\w+@(\w+\.?)+\s", "" ]
    }
  }
}

output {
  if [type] == "application1" {
    elasticsearch {
      host => "elasticsearch.test"
      protocol => "transport"
      port => 9300
      cluster => "elasticsearch"
    }
  }
}

I have seen elsewhere the multiline codec can cause this. I have tried without using this codec and I still get this high CPU usage.

Running on CentOS 6.6 with

# /opt/logstash/bin/logstash --version
logstash 1.4.2-modified
jruby 1.7.11 (1.9.3p392) 2014-02-24 86339bb on OpenJDK 64-Bit Server VM 1.7.0_75-mockbuild_2015_01_20_23_39-b00 [linux-amd64]
java 1.7.0_75 (Oracle Corporation)
jvm OpenJDK 64-Bit Server VM / 24.75-b04
Elasticsearch: Version: 1.1.1, Build: f1585f0/2014-04-16T14:27:12Z, JVM: 1.7.0_75
1 Like

Perhaps run jstack to see which threads are stuck where? http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstack.html

2 Likes

I'm not sure what to look for, but the jstack output can be seen in this gist: https://gist.github.com/kaaelhaa/15b3b921e8aa018ca65a

The worker thread in your stack dump is here:

"|worker" daemon prio=10 tid=0x00007f7e54107800 nid=0x7b69 runnable [0x00007f7e94fba000]
   java.lang.Thread.State: RUNNABLE
        at org.joni.ByteCodeMachine.opCClass(ByteCodeMachine.java:711)
        at org.joni.ByteCodeMachine.matchAt(ByteCodeMachine.java:220)
        at org.joni.Matcher.matchCheck(Matcher.java:304)
        at org.joni.Matcher.searchInterruptible(Matcher.java:480)
...
        at rubyjit.LogStash::Filters::Mutate$$gsub_09662a8d1829f3bda99ebf19420a4d91e67ace5f574906110.block_0$RUBY$__file__(/opt/logstash/lib/logstash/filters/mutate.rb:315)

This means that at the time of the dump, the worker was executing a regular expression. Further down you'll see the mutate filter mentioned.

If you can do several repeated jstacks and the mutate filter is always (or almost always) in that thread dump, then it's reasonable to believe that the mutate filter is the high cpu consumer here.

Take a few dumps and see if you can find any common patterns (words, etc) in the stack dump for the |worker thread.

3 Likes

Thank you for clarifying that.

It was for sure a poorly performing regex in mutate (\w+@(\w+\.?)+\s). After I crushed that one, there were a similar bad performing regex in the grok filter. After fixing those, all is good and logstash (all processes combined) are using just above 1 % CPU under normal load.

1 Like