Worker thread consumes 100 % CPU (for no reason)

kaaelhaa · May 6, 2015, 6:53am

Hi,

I've recently set up a full ELK-stack. Things are not playing nice though. I'm experiencing the Logstash worker thread is constantly consuming 100 % of one of my CPU cores, even when there are nothing to ship or parse. Using multiple worker threads just loads multiple cores at 100 %.

My config is fairly simple and consists of two files at the moment.

injector.conf:

#Injector
input {
  file {
    type => "application1"
    path => "/var/log/application1/debug.log"
    codec => multiline {
      pattern => "%{TIMESTAMP_ISO8601}"
      negate => true
      what => "previous"
    }
  }

  file {
    type => "application1"
    path => "/var/log/application1/error.log"
    codec => multiline {
      pattern => "%{TIMESTAMP_ISO8601}"
      negate => true
      what => "previous"
    }
  }
}

output {
  if [type] == "application1" {
    rabbitmq {
      exchange => "logs-test"
      host => "127.0.0.1"
      exchange_type => "topic"
      key => "logs.%{[type]}.%{host}"
    }
  }
}

parser.conf:

#Parser
input {
  rabbitmq {
    queue => "logs-test"
    key => "#"
    host => "127.0.0.1"
    exchange => "logs-test"
  }
}

filter {
  if [type] == "application1" {
    grok {
      add_field => {
        "environment" => "test"
      }
      match => {
        "message" => "(?m)(?<call-id>\w+@(\w+\.?)+)?%{SPACE}?(?<fs-timestamp>20(\d{2}\-?){3}\s(\d{2}[:\.\b]){3}\d*) \[%{LOGLEVEL:loglevel}\] (?<function>\b\w+\.\w:\d*\b) %{GREEDYDATA:message}"
      }
      overwrite => [ "message" ]
    }

    mutate {
      lowercase => [ "loglevel" ]
      gsub => [ "message", "\w+@(\w+\.?)+\s", "" ]
    }
  }
}

output {
  if [type] == "application1" {
    elasticsearch {
      host => "elasticsearch.test"
      protocol => "transport"
      port => 9300
      cluster => "elasticsearch"
    }
  }
}

I have seen elsewhere the multiline codec can cause this. I have tried without using this codec and I still get this high CPU usage.

Running on CentOS 6.6 with

# /opt/logstash/bin/logstash --version
logstash 1.4.2-modified
jruby 1.7.11 (1.9.3p392) 2014-02-24 86339bb on OpenJDK 64-Bit Server VM 1.7.0_75-mockbuild_2015_01_20_23_39-b00 [linux-amd64]
java 1.7.0_75 (Oracle Corporation)
jvm OpenJDK 64-Bit Server VM / 24.75-b04
Elasticsearch: Version: 1.1.1, Build: f1585f0/2014-04-16T14:27:12Z, JVM: 1.7.0_75

Sam_Saffron · May 6, 2015, 7:07am

Perhaps run jstack to see which threads are stuck where? http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstack.html

kaaelhaa · May 6, 2015, 7:24am

I'm not sure what to look for, but the jstack output can be seen in this gist: https://gist.github.com/kaaelhaa/15b3b921e8aa018ca65a

jordansissel · May 6, 2015, 7:28pm

The worker thread in your stack dump is here:

"|worker" daemon prio=10 tid=0x00007f7e54107800 nid=0x7b69 runnable [0x00007f7e94fba000]
   java.lang.Thread.State: RUNNABLE
        at org.joni.ByteCodeMachine.opCClass(ByteCodeMachine.java:711)
        at org.joni.ByteCodeMachine.matchAt(ByteCodeMachine.java:220)
        at org.joni.Matcher.matchCheck(Matcher.java:304)
        at org.joni.Matcher.searchInterruptible(Matcher.java:480)
...
        at rubyjit.LogStash::Filters::Mutate$$gsub_09662a8d1829f3bda99ebf19420a4d91e67ace5f574906110.block_0$RUBY$__file__(/opt/logstash/lib/logstash/filters/mutate.rb:315)

This means that at the time of the dump, the worker was executing a regular expression. Further down you'll see the mutate filter mentioned.

If you can do several repeated jstacks and the mutate filter is always (or almost always) in that thread dump, then it's reasonable to believe that the mutate filter is the high cpu consumer here.

Take a few dumps and see if you can find any common patterns (words, etc) in the stack dump for the |worker thread.

kaaelhaa · May 8, 2015, 7:10pm

Thank you for clarifying that.

It was for sure a poorly performing regex in mutate (\w+@(\w+\.?)+\s). After I crushed that one, there were a similar bad performing regex in the grok filter. After fixing those, all is good and logstash (all processes combined) are using just above 1 % CPU under normal load.

Topic		Replies	Views
Logstash using 100% of one CPU core Logstash	5	3452	July 6, 2017
High cpu usage by worker thread Logstash	8	3999	July 6, 2017
Stuck (~100 % CPU) pipeline workers Logstash	8	2184	February 11, 2019
Logstash dont use all available CPU-Cores Logstash	8	4232	July 6, 2017
Debug Logstash hot_threads Logstash	4	1296	January 19, 2018

Worker thread consumes 100 % CPU (for no reason)

injector.conf:

parser.conf:

Related topics