Hi folks,
I need your advice solving following problem: I am going to parse the Logfiles from Akamai CDN using Logstash. In order to do this I'm using the following Logstash config:
input {
file {
max_open_files => 1000000
id => "akamai-cdn"
type => "akamai-plain"
path => "/data/logs/akamai-decompressed/*"
start_position => "beginning"
codec => plain
}
}
filter {
grok {
match => { "message" => "%{DATE_EU:date}\t%{TIME:time}\s%{IP:clientip}\s%{WORD:httpmethod}\t%{PATH:requestedpage}\s%{NUMBER:responsecode}\s%{NUMBER:bytessent}\s%{NUMBER:timetaken}\t%{DATA:csreferrer}\t%{DATA:csuseragent}\t%{DATA:cscookie}" }
add_tag => ["akamai-cdn"]
}
date {
match => ["timestamp", "yyyy-MM-dd HH:hh:ss"]
target => "@timestamp"
add_field => {"debug" => "timestampisfixed"}
}
}
#and the es-output...
########## sample log lines#################
2017-11-19 | 15:14:33 | 95.90.212.131 | GET | /cdn-aka-ee-xxxxxxxx.xxxxxxxx/mall/shopde/pic/tbild3/tbild3-xxxxxxxx.JPG | 200 | 1841 | 0 | "-" | "-" | "-" |
---|---|---|---|---|---|---|---|---|---|---|
2017-11-19 | 15:14:33 | 95.90.212.131 | GET | /cdn-aka-ee-xxxxxxxx.xxxxxxxxe/mall/shopde/pic/bild0/Bild0-xxxxxxxx.JPG | 200 | 5030 | 0 | "-" | "-" | "-" |
2017-11-19 | 15:14:33 | 95.90.212.131 | GET | /cdn-aka-ee-xxxxxxxx.xxxxxxxxe/mall/shopde/pic/tbild1/tbild1-xxxxxxxx.JPG | 200 | 1715 | 0 | "-" | "-" | "-" |
########################################
the "|" chars are in tabs or spaces in the log lines
When I start the pipeline the Logstash is calculating the hell out of itself, all cores are used completly to parse the logs. It's not that big amount of Data, we are talking about 1 million lines...
do i have a bad filter ?