Logstash is calculating itself to sudden death

Haenschen · December 13, 2017, 8:01am

Hi folks,

I need your advice solving following problem: I am going to parse the Logfiles from Akamai CDN using Logstash. In order to do this I'm using the following Logstash config:

input {
file {
max_open_files => 1000000
id => "akamai-cdn"
type => "akamai-plain"
path => "/data/logs/akamai-decompressed/*"
start_position => "beginning"
codec => plain
}
}
filter {
grok {
match => { "message" => "%{DATE_EU:date}\t%{TIME:time}\s%{IP:clientip}\s%{WORD:httpmethod}\t%{PATH:requestedpage}\s%{NUMBER:responsecode}\s%{NUMBER:bytessent}\s%{NUMBER:timetaken}\t%{DATA:csreferrer}\t%{DATA:csuseragent}\t%{DATA:cscookie}" }
add_tag => ["akamai-cdn"]
}
date {
match => ["timestamp", "yyyy-MM-dd HH:hh:ss"]
target => "@timestamp"
add_field => {"debug" => "timestampisfixed"}
}

}

#and the es-output...

########## sample log lines#################

2017-11-19	15:14:33	95.90.212.131	GET	/cdn-aka-ee-xxxxxxxx.xxxxxxxx/mall/shopde/pic/tbild3/tbild3-xxxxxxxx.JPG	200	1841	0	"-"	"-"	"-"
2017-11-19	15:14:33	95.90.212.131	GET	/cdn-aka-ee-xxxxxxxx.xxxxxxxxe/mall/shopde/pic/bild0/Bild0-xxxxxxxx.JPG	200	5030	0	"-"	"-"	"-"
2017-11-19	15:14:33	95.90.212.131	GET	/cdn-aka-ee-xxxxxxxx.xxxxxxxxe/mall/shopde/pic/tbild1/tbild1-xxxxxxxx.JPG	200	1715	0	"-"	"-"	"-"

########################################
the "|" chars are in tabs or spaces in the log lines

When I start the pipeline the Logstash is calculating the hell out of itself, all cores are used completly to parse the logs. It's not that big amount of Data, we are talking about 1 million lines...
do i have a bad filter ?

magnusbaeck · December 14, 2017, 7:00am

When I start the pipeline the Logstash is calculating the hell out of itself, all cores are used completly to parse the logs.

Yes. By default Logstash starts as many pipeline workers as you have CPU cores. This is configurable.

It's not that big amount of Data, we are talking about 1 million lines...

The amount of data doesn't affect the CPU usage, a rate measure, only the accumulated CPU time and the wall clock time.

Similarly, an electric stove uses the same amount of power (1 kW or whatever) when you heat water regardless of how much water you have, but if you have more water it'll take longer and the amount of energy used will be higher.

So, if we rephrase your question to "how can I make Logstash use less CPU time" you can start by replacing the DATA patterns with NOTSPACE. Having more than one DATA or GREEDYDATA in the same grok pattern is almost always a mistake.

system · January 11, 2018, 7:01am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash filter for akamai logs Logstash	2	1540	December 7, 2017
Logstash: error when use date filter Logstash	2	498	December 31, 2020
Parsing Akamai logs Logstash	4	1274	October 11, 2018
Akamai logs to logstash Logstash	2	1954	July 12, 2019
AKAMAI => Filebeat => Logstash Beats filebeat	7	3114	January 9, 2018

Logstash is calculating itself to sudden death

Related topics