Logstash is calculating itself to sudden death

Hi folks,

I need your advice solving following problem: I am going to parse the Logfiles from Akamai CDN using Logstash. In order to do this I'm using the following Logstash config:

input {
file {
max_open_files => 1000000
id => "akamai-cdn"
type => "akamai-plain"
path => "/data/logs/akamai-decompressed/*"
start_position => "beginning"
codec => plain
}
}
filter {
grok {
match => { "message" => "%{DATE_EU:date}\t%{TIME:time}\s%{IP:clientip}\s%{WORD:httpmethod}\t%{PATH:requestedpage}\s%{NUMBER:responsecode}\s%{NUMBER:bytessent}\s%{NUMBER:timetaken}\t%{DATA:csreferrer}\t%{DATA:csuseragent}\t%{DATA:cscookie}" }
add_tag => ["akamai-cdn"]
}
date {
match => ["timestamp", "yyyy-MM-dd HH:hh:ss"]
target => "@timestamp"
add_field => {"debug" => "timestampisfixed"}
}

}

#and the es-output...

########## sample log lines#################

2017-11-19 15:14:33 95.90.212.131 GET /cdn-aka-ee-xxxxxxxx.xxxxxxxx/mall/shopde/pic/tbild3/tbild3-xxxxxxxx.JPG 200 1841 0 "-" "-" "-"
2017-11-19 15:14:33 95.90.212.131 GET /cdn-aka-ee-xxxxxxxx.xxxxxxxxe/mall/shopde/pic/bild0/Bild0-xxxxxxxx.JPG 200 5030 0 "-" "-" "-"
2017-11-19 15:14:33 95.90.212.131 GET /cdn-aka-ee-xxxxxxxx.xxxxxxxxe/mall/shopde/pic/tbild1/tbild1-xxxxxxxx.JPG 200 1715 0 "-" "-" "-"

########################################
the "|" chars are in tabs or spaces in the log lines

When I start the pipeline the Logstash is calculating the hell out of itself, all cores are used completly to parse the logs. It's not that big amount of Data, we are talking about 1 million lines...
do i have a bad filter ?

When I start the pipeline the Logstash is calculating the hell out of itself, all cores are used completly to parse the logs.

Yes. By default Logstash starts as many pipeline workers as you have CPU cores. This is configurable.

It's not that big amount of Data, we are talking about 1 million lines...

The amount of data doesn't affect the CPU usage, a rate measure, only the accumulated CPU time and the wall clock time.

Similarly, an electric stove uses the same amount of power (1 kW or whatever) when you heat water regardless of how much water you have, but if you have more water it'll take longer and the amount of energy used will be higher.

So, if we rephrase your question to "how can I make Logstash use less CPU time" you can start by replacing the DATA patterns with NOTSPACE. Having more than one DATA or GREEDYDATA in the same grok pattern is almost always a mistake.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.