Problem dropping multiple log lines with match statement

I'm using LogStash to process AWS Cloudfront logs and I've got a bit of a problem with the log headers.

At the start of each log file are two header lines before the main log-data itself. Both start with # and take the format:

#Version: 1.0
#Fields:

I added a match before my grok filter and a check for a parser error after but it only drops one of the two lines, not both. This results in having a single entry per log file containing one of the header value as "message".

# drop anything starting with #
if [message] =~ /^#/ {
    drop{}
}

grok {
  match => {
    "message" => "%{DATE_EU:date}[\t]%{TIME:time}[\t](?<edge_location>\b[\w\-]+\b)[\t](?:%{INT:resp_bytes}|-)[\t]%{IPORHOST:client_ip}[\t]%{WORD:req_method}[\t]%{HOSTNAME:cf_host}[\t]%{URIPATH:req_path}[\t]%{INT:resp_status}[\t](?:%{URI:referrer}|-)[\t]%{NOTSPACE:User_Agent}[\t]%{NOTSPACE:req_query}[\t]%{NOTSPACE:req_cookies}[\t]%{WORD:edge_resp_type}[\t]%{NOTSPACE:req_id}[\t]%{HOSTNAME:req_hostname}[\t]%{URIPROTO:req_protocol}[\t]%{INT:req_bytes}[\t]%{NUMBER:time_taken:float}[\t]%{NOTSPACE:x_forwarded_for}[\t]%{NOTSPACE:tls_ver}[\t]%{NOTSPACE:tls_cipher}[\t]%{WORD:edge_response_result_type}[\t]%{NOTSPACE:req_protocol_ver}[\t]%{NOTSPACE:fle_status}[\t]%{NOTSPACE:fle_encrypted_fields}"
  }
}

# drop lines we can't parse
if "_grokparsefailure" in [tags] {
  drop{}
}

I've tried putting the filter after the grok match and the tag match but it still only filters out one of the headers.

Can anyone shed any light?

Can you provide a test message ?

I've tested this with my filters and it drops both version and fields

Not much to share really, start of the log ingest looks like this:

#Version: 1.0
#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields
<grokable log lines follow>

What I'm finding in my index's is the following:

{
  "_index": "cloudfront-prod-logs-2019.08.29",
  "_type": "doc",
  "_id": "%{[req_id]}",
  "_version": 476,
  "_score": null,
  "_source": {
    "message": "#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields\n",
    "@timestamp": "2019-08-29T12:54:08.189Z",
    "@version": "1"
  },
  "fields": {
    "@timestamp": [
      "2019-08-29T12:54:08.189Z"
    ]
  },
  "sort": [
    1567083248189
  ]
}

So it's catching the first line with a # but not the second and inserting it into my index. :confused:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.