Why my Logstash is consuming more than 80% of CPU?

Hi,

I use ELK GA 5.0.0. I have 2 Logstash instances in my Linux box. Both of them are designed to consume from Kafka. One logstash consume from a Kafka topic while the other Logstash consume from 3 other Kafka topics. I have designed Logstash instances like this because the first topic has more data compared to other 3 topics. Using the top command in Linux, I have noticed that my Logstash is taking a lot of CPU. I then tried Logstash's hot thread API and got the output like below;

{
  "host" : "my-linux-box",
  "version" : "5.0.0",
  "http_address" : "192.168.0.1:9600",
  "hot_threads" : {
    "time" : "2017-03-07T10:22:03+00:00",
    "busiest_threads" : 3,
    "threads" : [ {
      "name" : "Ruby-0-Thread-11",
      "percent_of_cpu_time" : 80.56,
      "state" : "waiting",
      "path" : "/apps/ELK/Logstash/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-5.0.5/lib/logstash/inputs/kafka.rb:191",
      "traces" : [ "sun.misc.Unsafe.park(Native Method)", "java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)", "java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)", "java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)", "java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)", "java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)", "java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)", "org.jruby.javasupport.util.ObjectProxyCache$Segment.getOrCreate(ObjectProxyCache.java:449)", "org.jruby.javasupport.util.ObjectProxyCache.getOrCreate(ObjectProxyCache.java:164)", "org.jruby.javasupport.Java.getInstance(Java.java:385)" ]
    }, {
      "name" : "Ruby-0-Thread-12",
      "percent_of_cpu_time" : 80.27,
      "state" : "runnable",
      "path" : "/apps/ELK/Logstash/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-5.0.5/lib/logstash/inputs/kafka.rb:191",
      "traces" : [ "java.lang.System.identityHashCode(Native Method)", "org.jruby.javasupport.util.ObjectProxyCache.hash(ObjectProxyCache.java:174)", "org.jruby.javasupport.util.ObjectProxyCache.getOrCreate(ObjectProxyCache.java:163)", "org.jruby.javasupport.Java.getInstance(Java.java:385)", "org.jruby.javasupport.Java.getInstance(Java.java:364)", "org.jruby.javasupport.JavaUtil.convertJavaToUsableRubyObject(JavaUtil.java:194)", "org.jruby.javasupport.JavaMethod.convertReturn(JavaMethod.java:513)", "org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:437)", "org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:300)", "org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:35)" ]
    }, {
      "name" : "Ruby-0-Thread-13",
      "percent_of_cpu_time" : 80.21,
      "state" : "blocked",
      "path" : "/apps/ELK/Logstash/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-5.0.5/lib/logstash/inputs/kafka.rb:191",
      "traces" : [ "java.lang.ref.ReferenceQueue.poll(ReferenceQueue.java:107)", "org.jruby.javasupport.util.ObjectProxyCache$Segment.expunge(ObjectProxyCache.java:320)", "org.jruby.javasupport.util.ObjectProxyCache$Segment.getOrCreate(ObjectProxyCache.java:451)", "org.jruby.javasupport.util.ObjectProxyCache.getOrCreate(ObjectProxyCache.java:164)", "org.jruby.javasupport.Java.getInstance(Java.java:385)", "org.jruby.javasupport.Java.getInstance(Java.java:364)", "org.jruby.javasupport.JavaUtil.convertJavaToUsableRubyObject(JavaUtil.java:194)", "org.jruby.javasupport.JavaMethod.convertReturn(JavaMethod.java:513)", "org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:437)", "org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:300)" ]
    } ]
  }
}

It will be really helpful if somebody tell me why this is happening and how to fix this?

Thank you.

How many events are being processed every second?

Hi @magnusbaeck below is my Kafka input configuration;

kafka{
	group_id => "consumergroup_a"
	topics => ["topic_a"]
	bootstrap_servers => "192.168.0.1:9091,192.168.0.3:9093,192.168.0.2:9092"
	add_field => { "logtype" => "LOGFILE"  }
}

Okay, but that doesn't answer my question.

@magnusbaeck sorry that I don't know what you are expecting. Events means? Is that the number of messages in Kafka topic every second?

There will be an average of 50 entries every second

Below is my entire Logstash configuration of Logstash which consumes from 3 Kafka topic, and has higher CPU utilization;

input {
	kafka{
		group_id => "consumergroup_a"
		topics => ["topic_a"]
		bootstrap_servers => "192.168.0.1:9091,192.168.0.3:9093,192.168.0.2:9092"
		codec => json
		add_field => { "logtype" => "LOGFILE" }
	}
	kafka{
		group_id => "consumergroup_b"
		topics => ["topic_b"]
		bootstrap_servers => "192.168.0.1:9091,192.168.0.3:9093,192.168.0.2:9092"
		add_field => { "logtype" => "LOGFILE"  }
	}
	kafka{
		group_id => "consumergroup_c"
		topics => ["topic_c"]
		bootstrap_servers => "192.168.0.1:9091,192.168.0.3:9093,192.168.0.2:9092"
		add_field => { "logtype" => "logfile_c"  }
	}
}

filter {
	if[logtype] == "LOGFILE"{
		if[fields][logtype] == "logfile_a"{
			grok{
				match => { "message" => "####<(?<timestamp>%{MONTH} %{MONTHDAY}, 20%{YEAR} %{HOUR}:?%{MINUTE}(?::?%{SECOND}) (?:AM|PM) (?:GMT))\> <%{GREEDYDATA:field_a}> <%{GREEDYDATA:field_b}> <%{GREEDYDATA:skip1}> <%{GREEDYDATA:field_c}> <%{GREEDYDATA:skip2}> <%{GREEDYDATA:skip3}> <%{GREEDYDATA:skip4}> <%{GREEDYDATA:field_d}> <%{GREEDYDATA:skip5}> <%{GREEDYDATA:skip6}> <%{GREEDYDATA:field_e}>" }	
			}
			mutate{
				remove_field => ["skip1","skip2","skip3","skip4","skip5","skip6"]
			}
			date {
				match => [ "timestamp", "MMM dd, YYYY hh:mm:ss aa z" ]
				timezone => "GMT"
				target => "@timestamp"
			}
		}else if[fields][logtype] == "logfile_b"{
			grok{
				match => { "message" => "%{NOTSPACE:field_a} %{NOTSPACE:field_b} ('%{NOTSPACE:field_c}'|%{NOTSPACE:field_c}) %{NOTSPACE:field_d} %{NOTSPACE:field_e} %{NOTSPACE:field_f} %{NOTSPACE:field_g} %{NOTSPACE:field_h} %{NOTSPACE:field_i} %{NOTSPACE:field_j} ('%{NOTSPACE:field_k}'|%{NOTSPACE:field_k}) %{NOTSPACE:field_l} ('%{GREEDYDATA:field_m}'|%{NOTSPACE:field_m}) ('%{GREEDYDATA:field_n}'|%{NOTSPACE:field_n}) ('%{GREEDYDATA:field_o}'|%{NOTSPACE:field_o}) ('%{GREEDYDATA:field_p}'|%{NOTSPACE:field_p})" }
			}
			if [field_d] == "-" {
				mutate {
					replace => [ "field_d", "-1" ]
				}
			}
			if [field_f] == "-" {
				mutate {
					replace => [ "field_f", "-1" ]
				}
			}
			mutate {
				add_field => {
					"logtime" => "%{field_a} %{field_b}"
				}
				remove_field => [ "field_a", "field_b" ]
				gsub => ["field_g","\'",""]
			}
			date {
				match => [ "logtime", "YYYY-MM-dd HH:mm:ss" ]
				timezone => "GMT"
				target => "@timestamp"
			}
		}
	}else if[logtype] == "logfile_c"{
		grok {
			match => { "message" => "^##(?<level>(levelA|levelB|levelC|levelD|levelE)) (?<timestamp>%{MONTHDAY}-%{MONTH}-%{YEAR} %{TIME}) %{NOTSPACE:field_a} %{NOTSPACE:field_b} %{GREEDYDATA:field_c}" }
		}
		date {
			match => [ "timestamp", "dd-MMM-yyyy HH:mm:ss.SSS" ]
			timezone => "UTC"
			target => "@timestamp"
		}
	}
}

output{
	if[logtype] == "LOGFILE"{
		if[fields][logtype] == "logfile_a"{
			elasticsearch {
				hosts => ["192.168.0.1:9200","192.168.0.2:9200","192.168.0.3:9200"]
				index => "indexa-%{+YYYY-MM-dd}"
				user => elastic
				password => elasticpassword
			}
		}else if[fields][logtype] == "logfile_b"{
			elasticsearch {
				hosts => ["192.168.0.1:9200","192.168.0.2:9200","192.168.0.3:9200"]
				index => "indexb-%{+YYYY-MM-dd}"
				user => elastic
				password => elasticpassword
			}
		}else if[logtype] == "logfile_c"{
			elasticsearch {
				hosts => ["192.168.0.1:9200","192.168.0.2:9200","192.168.0.3:9200"]
				index => "indexc-%{+YYYY-MM-dd}"
				user => elastic
				password => elasticpassword
			}
		}
	}
}

50 events/second shouldn't cause any CPU spikes on any reasonably recent machine. Not sure what's up here.

Your first grok pattern has 11 occurrences of GREEDYDATA, which will be very inefficient. I saw a grok pattern later in the config that had a similar structure but instead used NOTSPACE. Would it be possible to replace GREEDYDATA with NOTSPACE in the first grok pattern as well?

1 Like

Above you can see my entire configuration and if you want some trace or settings, I can give you. Thanks for your help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.