Problem dropping multiple log lines with match statement

Neje · August 28, 2019, 3:01pm

I'm using LogStash to process AWS Cloudfront logs and I've got a bit of a problem with the log headers.

At the start of each log file are two header lines before the main log-data itself. Both start with # and take the format:

#Version: 1.0
#Fields:

I added a match before my grok filter and a check for a parser error after but it only drops one of the two lines, not both. This results in having a single entry per log file containing one of the header value as "message".

# drop anything starting with #
if [message] =~ /^#/ {
    drop{}
}

grok {
  match => {
    "message" => "%{DATE_EU:date}[\t]%{TIME:time}[\t](?<edge_location>\b[\w\-]+\b)[\t](?:%{INT:resp_bytes}|-)[\t]%{IPORHOST:client_ip}[\t]%{WORD:req_method}[\t]%{HOSTNAME:cf_host}[\t]%{URIPATH:req_path}[\t]%{INT:resp_status}[\t](?:%{URI:referrer}|-)[\t]%{NOTSPACE:User_Agent}[\t]%{NOTSPACE:req_query}[\t]%{NOTSPACE:req_cookies}[\t]%{WORD:edge_resp_type}[\t]%{NOTSPACE:req_id}[\t]%{HOSTNAME:req_hostname}[\t]%{URIPROTO:req_protocol}[\t]%{INT:req_bytes}[\t]%{NUMBER:time_taken:float}[\t]%{NOTSPACE:x_forwarded_for}[\t]%{NOTSPACE:tls_ver}[\t]%{NOTSPACE:tls_cipher}[\t]%{WORD:edge_response_result_type}[\t]%{NOTSPACE:req_protocol_ver}[\t]%{NOTSPACE:fle_status}[\t]%{NOTSPACE:fle_encrypted_fields}"
  }
}

# drop lines we can't parse
if "_grokparsefailure" in [tags] {
  drop{}
}

I've tried putting the filter after the grok match and the tag match but it still only filters out one of the headers.

Can anyone shed any light?

naveenrt23 · August 28, 2019, 3:29pm

Can you provide a test message ?

naveenrt23 · August 28, 2019, 3:39pm

I've tested this with my filters and it drops both version and fields

Neje · August 30, 2019, 8:18am

Not much to share really, start of the log ingest looks like this:

#Version: 1.0
#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields
<grokable log lines follow>

What I'm finding in my index's is the following:

{
  "_index": "cloudfront-prod-logs-2019.08.29",
  "_type": "doc",
  "_id": "%{[req_id]}",
  "_version": 476,
  "_score": null,
  "_source": {
    "message": "#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields\n",
    "@timestamp": "2019-08-29T12:54:08.189Z",
    "@version": "1"
  },
  "fields": {
    "@timestamp": [
      "2019-08-29T12:54:08.189Z"
    ]
  },
  "sort": [
    1567083248189
  ]
}

So it's catching the first line with a # but not the second and inserting it into my index.

system · September 27, 2019, 8:18am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Match part of a log Logstash	4	617	July 6, 2017
How do i drop headers in a log file? (grokparsefailure) Logstash	2	962	October 8, 2017
Logstash Drop Lines Logstash	3	906	February 12, 2019
Logstash skips first lines and start reading from second line Logstash	4	440	May 10, 2022
Appeared _grokparsefailure, unable to parse specific lines from log file Logstash	1	225	January 11, 2021

Problem dropping multiple log lines with match statement

Related topics