Logstash consuming lot of CPU


(Saket Kumar) #1

I am using Logstash 2.4 to process multiple files, therefore to process all files at time; launching multiple logstash instances using "logstash -f file.conf < file1" on a Windows Server. I don't need to tail and these files has to be processed once.

Tried options by tweaking LS_HEAP and Number of workers but no improvement.

I am having Elasticsearch also running on same server. Due 99% CPU utilization other processes are hampered.

Looking out for optimization options available to optimize CPU utilization by Logstash.

Expected load is too high as there can be 100 files to process at the same time and with level of CPU utilization, it would be difficult to manage.

Please suggest better option if any.


(Mark Walkom) #2

Why are you running such an old version? The latest is 6.1.1, you should really upgrade.

What JVM are you using? What does your config file look like?


(Saket Kumar) #3

I am using 1.8 JVM and the config file looks like as follows:

input {
stdin
}
filter {
if ([message] =~ "responseCode") {
drop { }
} else {
csv {
separator => ","
columns => ["timeStamp", "CompTime", "label", "Code", "Response", "threadName", "dataType", "success", "failureMessage", "bytes", "VUsers", "Vuser_all", "URL", "TTFB", "Encoding", "SampleCount", "ErrorCount", "Hostname", "ThinkTime", "ConnectionTime"]
}
}
grok
mutate {
split => { "filename" => "_" }
add_field => { "Project" => "%{[filename][0]}" }
add_field => { "RunID" => "%{[filename][1]}" }
}
date {
locale => "en"
match => ["timeStamp", "yyyy/MM/dd HH:mm:ss.SSS", "UNIX_MS"]
target => "timeStamp"
timezone => "Asia/Kolkata"
}
mutate {
split => { "label" => "-" }
add_field => { "Scenario" => "%{[label][0]}" }
add_field => { "Transaction" => "%{[label][1]}" }
add_field => { "Request" => "%{[label][2]}" }
}
if [Transaction] == "%{[label][1]}" {
mutate { replace => { "Transaction" => "NULL" }}
}
if [Request] == "%{[label][2]}" {
mutate { replace => { "Request" => "NULL" }}
}

if [success] == "true" or [success] == "TRUE" {
mutate { add_field => { "PassCount" => "1" }}
mutate { add_field => { "FailCount" => "0" }}
}
if [success] == "false" or [success] == "FALSE" {
mutate { add_field => { "PassCount" => "0" }}
mutate { add_field => { "FailCount" => "1" }}
}
ruby {
	code => "
			event['ServeTime'] = event['CompTime'].to_i-event['TTFB'].to_i
			"
	}
   ruby {
    code => "
            vartime = ENV['envtime']
            if (vartime.nil?)
                StartT = event['timeStamp'].to_i
                EndT = event['timeStamp'].to_i
                ENV['envtime'] = StartT.to_s
                diff = EndT - StartT
                event['RT'] = Time.at(diff.to_i.abs).utc.strftime '%H:%M:%S'
            else 
                StartT = vartime.to_i
                EndT = event['timeStamp'].to_i
                diff = EndT - StartT
                event['RT'] = Time.at(diff.to_i.abs).utc.strftime '%H:%M:%S'
            end
            "
        }
   	date { 
locale => "en"
match => ["RT", "HH:mm:ss"]
target => "RelativeTime"
timezone => "Asia/Kolkata"
} 	
mutate {convert => ["CompTime", "integer"]}
mutate {convert => ["ServeTime", "integer"]}
mutate {convert => ["Code", "string"]}
mutate {convert => ["bytes", "integer"]}
mutate {convert => ["VUsers", "integer"]}
mutate {convert => ["Vuser_all", "integer"]}
mutate {convert => ["TTFB", "integer"]}
mutate {convert => ["SampleCount", "integer"]}
mutate {convert => ["ErrorCount", "integer"]}
mutate {convert => ["ThinkTime", "integer"]}
mutate {convert => ["PassCount", "integer"]}
mutate {convert => ["FailCount", "integer"]}
mutate {convert => ["ConnectionTime", "integer"]}
mutate {lowercase => ["Project"]}

}
output {
elasticsearch {
action => "index"
hosts => "localhost:9200"
index => "logstash-%{Project}-%{+YYYY.MM.dd}"
}
stdout {}
}


#4

why are you using logstash 2.4 though? we're up to 6.1


(Mark Walkom) #5

Definitely this, there are a number of improvements to performance you would benefit from.


(Magnus Bäck) #6

Tried options by tweaking LS_HEAP and Number of workers but no improvement.

Lowering the number of pipeline workers to one will (for a multi-core system) limit the amount of CPU used by Logstash.

I am having Elasticsearch also running on same server. Due 99% CPU utilization other processes are hampered.

You should consider running Logstash at a lower priority to reduce its impact on the system's performance. 99% CPU utilization typically isn't a problem if the process(es) using the CPU are pre-empted by just about any other process.

The throttle filter can also be helpful.


(Saket Kumar) #7

Sure i will give try to these... Thanks.


(Saket Kumar) #8

I am using multi line filter plugin. Which is deprecated in 5.x onward.
I gave a try but faced limit issue for event "max_lines =>". I changed it but no success. I want to process entire line it was restricting till 900. The XML file which I am processing contains more than thousand lines.


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.