Match variants of logs

lwhitworth · January 5, 2016, 2:35pm

Hey all,

I'm ingesting some logs where the lines start the same (timestamp) and then vary thereafter with multiple fields appearing but in various orders, e.g.

File 1:

Tue Jan 5 13:01:21 2016 Packet-Type = Access-Request NAS-Port-Id = "AA111/2" Calling-Station-Id = "FF-FF-FF-FF-FF-FF" Called-Station-Id = "AA-AA-AA-AA-AA-AA:wireless" Service-Type = Framed-User User-Name = "bob@bob.com"

Tue Jan 5 13:02:21 2016 Packet-Type = Access-Request NAS-Port-Id = "AA111/2" User-Name = "bob@bob.com" Calling-Station-Id = "FF-FF-FF-FF-FF-FF" Service-Type = Framed-User Called-Station-Id = "AA-AA-AA-AA-AA-AA:wireless"

Tue Jan 5 13:03:21 2016 Packet-Type = Access-Request NAS-Port-Id = "AA111/2" Calling-Station-Id = "FF-FF-FF-FF-FF-FF" Service-Type = Framed-User User-Name = "bob@bob.com"

What ideally I'd like to do is have a match rule that has every possible field listed as optional. This would ensure that all lines matched IF they were in the same order, but the fact that they are not is where I'm stuck. Does anyone know a method that I can use to achieve this without having to resort to having a match that looks like (field_a|field_b|field_c) (field_a|field_b|field_c)? (field_a|field_b|field_c)? (field_a|field_b|field_c)?.

Cheers

magnusbaeck · January 5, 2016, 2:45pm

The kv filter would be ideal for this, except that it doesn't support multi-character separators between key and value (and in your case you have a space on each side of the equal sign). But, perhaps you could use the mutate filter's gsub option to replace all occurrences of " = " with plain "=" and feed that to the kv filter?

lwhitworth · January 5, 2016, 2:46pm

Cheers, I'll have a look into it and see how far I get

lwhitworth · January 5, 2016, 4:00pm

Absolutely spot on cheers, got it all working with:

filter {
  # This removes empty lines from the logs
  if [type] == "radiusauth" and [message] =~ /^\s*$/ {
    drop {
    }
  }
  # Now we match using white space at the start of the line to signify it belongs to the line before, i.e. indented lines are continuations of the line before
  if [type] == "radiusauth" {
    multiline {
      pattern => "^\s"
        what => "previous"
    }
    # Sanitisize the lines
    mutate {
      gsub => [
        "message", " = ", "=",
        "message", "\"", ""
      ]
    }
    grok {
      match => { "message" => "(?m)%{DAY:radiusauth_day} %{MONTH:radiusauth_month}%{SPACE}%{MONTHDAY:radiusauth_monthday} %{TIME:radiusauth_time} %{YEAR:radiusauth_year}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      add_field => [ "received_from", "%{host}" ]
      add_field => [ "radiusauth_timestamp", "%{radiusauth_month} %{radiusauth_monthday} %{radiusauth_time}"]
      add_tag => [ "radiusauth_auth"]
    }
    kv {
      source => "message"
      field_split => "\s"
    }
    date {
      match => [ "radiusauth_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
    }
  }
}

Very much appreciated Sir

lwhitworth · January 5, 2016, 4:21pm

After a little bit of testing the following works better as using \s splits out lines with legitimate double spaces (e.g. some date records)

kv {
      source => "message"
      field_split => "\r\n"
    }

Topic		Replies	Views
Multiple regex matches in a single line Logstash	5	1703	July 6, 2017
Logs on two lines Logstash	7	577	July 26, 2019
Issue with Parsing multiline log in together with filebeat multiline config and logstash Logstash	24	1278	July 1, 2020
GROK Multiple Match - Logstash Logstash	4	27162	July 6, 2017
Multiline and grok filter to merge multi line and then filter Logstash	7	10794	August 1, 2017

Match variants of logs

Related topics