I'm using LogStash to process AWS Cloudfront logs and I've got a bit of a problem with the log headers.
At the start of each log file are two header lines before the main log-data itself. Both start with # and take the format:
#Version: 1.0 #Fields:
I added a match before my grok filter and a check for a parser error after but it only drops one of the two lines, not both. This results in having a single entry per log file containing one of the header value as "message".
# drop anything starting with #
if [message] =~ /^#/ {
drop{}
}
grok {
match => {
"message" => "%{DATE_EU:date}[\t]%{TIME:time}[\t](?<edge_location>\b[\w\-]+\b)[\t](?:%{INT:resp_bytes}|-)[\t]%{IPORHOST:client_ip}[\t]%{WORD:req_method}[\t]%{HOSTNAME:cf_host}[\t]%{URIPATH:req_path}[\t]%{INT:resp_status}[\t](?:%{URI:referrer}|-)[\t]%{NOTSPACE:User_Agent}[\t]%{NOTSPACE:req_query}[\t]%{NOTSPACE:req_cookies}[\t]%{WORD:edge_resp_type}[\t]%{NOTSPACE:req_id}[\t]%{HOSTNAME:req_hostname}[\t]%{URIPROTO:req_protocol}[\t]%{INT:req_bytes}[\t]%{NUMBER:time_taken:float}[\t]%{NOTSPACE:x_forwarded_for}[\t]%{NOTSPACE:tls_ver}[\t]%{NOTSPACE:tls_cipher}[\t]%{WORD:edge_response_result_type}[\t]%{NOTSPACE:req_protocol_ver}[\t]%{NOTSPACE:fle_status}[\t]%{NOTSPACE:fle_encrypted_fields}"
}
}
# drop lines we can't parse
if "_grokparsefailure" in [tags] {
drop{}
}
I've tried putting the filter after the grok match and the tag match but it still only filters out one of the headers.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.