I'm using LogStash to process AWS Cloudfront logs and I've got a bit of a problem with the log headers.
At the start of each log file are two header lines before the main log-data itself. Both start with # and take the format:
#Version: 1.0
#Fields:
I added a match before my grok filter and a check for a parser error after but it only drops one of the two lines, not both. This results in having a single entry per log file containing one of the header value as "message".
# drop anything starting with #
if [message] =~ /^#/ {
drop{}
}
grok {
match => {
"message" => "%{DATE_EU:date}[\t]%{TIME:time}[\t](?<edge_location>\b[\w\-]+\b)[\t](?:%{INT:resp_bytes}|-)[\t]%{IPORHOST:client_ip}[\t]%{WORD:req_method}[\t]%{HOSTNAME:cf_host}[\t]%{URIPATH:req_path}[\t]%{INT:resp_status}[\t](?:%{URI:referrer}|-)[\t]%{NOTSPACE:User_Agent}[\t]%{NOTSPACE:req_query}[\t]%{NOTSPACE:req_cookies}[\t]%{WORD:edge_resp_type}[\t]%{NOTSPACE:req_id}[\t]%{HOSTNAME:req_hostname}[\t]%{URIPROTO:req_protocol}[\t]%{INT:req_bytes}[\t]%{NUMBER:time_taken:float}[\t]%{NOTSPACE:x_forwarded_for}[\t]%{NOTSPACE:tls_ver}[\t]%{NOTSPACE:tls_cipher}[\t]%{WORD:edge_response_result_type}[\t]%{NOTSPACE:req_protocol_ver}[\t]%{NOTSPACE:fle_status}[\t]%{NOTSPACE:fle_encrypted_fields}"
}
}
# drop lines we can't parse
if "_grokparsefailure" in [tags] {
drop{}
}
I've tried putting the filter after the grok match and the tag match but it still only filters out one of the headers.
Can anyone shed any light?
