Can't parse multiline logs with grok


(Thorsten Peter) #1

Hi Team,
i am currently trying to load Multiline-files with Logstash but for some reason i can't get more than the first line. I know that this topic has been discussed several times but all the solutions won't work in my case.

Some of my logs look like:

[4428] Restart #34, SysUpTime 251d 16:46:50
01/09/2018 17:56:15 (t)
(R)STP topology change detected while (R)STP is off.

[4427] Restart #34, SysUpTime 251d 16:46:48
01/09/2018 17:56:13 (t)
(R)STP topology change detected while (R)STP is off.

[4426] Restart #34, SysUpTime 251d 16:46:46
01/09/2018 17:56:11 (t)
(R)STP topology change detected while (R)STP is off.

My problem should be the grok filter as far as i understand. I can load the data into Kibana in matching "blocks" (if i have 10 blocks in my log i get 10 entrys in Kibana) but i get the tags:_grokparsefailure, _dateparsefailure, which means my filter dosen't match my document. My input and the filter section of my pipeline look like that:

input {
    
        file {
            path => "/home/bitnami/logs/Logdateien/SCALANCE/SC*.txt"
              codec => multiline {
                    pattern => "^\S"
                    negate => true
                    what => "next"
                }
            start_position => beginning
            sincedb_path => "/dev/null"
            ignore_older => 0
            
        }
 }
filter {
        grok {  
            match => { "message" => "\[%{NUMBER:Nummer}\] Restart #\d\d, SysUpTime %{NUMBER:Uptime}d \d\d:\d\d:\d\d\n%{DATE_US:Datum} %{TIME:Zeit} \(t\)\n%{GREEDYDATA:Message}" }
        }
        mutate {
            add_field => { 
                "timestamp" => "%{Datum} %{Zeit}"
            }
        }
        date {
            match => [ "timestamp","MM/dd/yyyy HH:mm:ss"]	
            locale => "en"
        }
        mutate {
            remove_field => [ "timestamp", "message" ]
        }   
}

I used the "Online Regex Tester" as well as the "Grok Constructor" the see if its a syntax problem but both work fine and say my config should work.
I have also tried to put (\n|\r)* instead of just \n which did not change anything.
I put (?m) in front as mentioned in other posts or i have tried to replace %{GREEDYDATA:Message} with (?<Message>(.|\r|\n)*)which did not work either.
If i reduce my grok filter to just parse the first line (without \n in the path) it works fine except the rest of the log message including the date which leads to _dateparsefailure. Because of that i assume that the problem lies in the \n.

To be more specific:
grok {
match => { "message" => "[%{NUMBER:Nummer}] Restart #\d\d, SysUpTime %{NUMBER:Uptime}d \d\d:\d\d:\d\d" }
}

I hope i could describe my problem sufficiently.


(Attila Boncok) #2

My assumption is that grok fails because the multiline doesn't properly assembles your lines the way you wish to. Your grok pattern assumes you combined the lines into one, but if you didn't, then it will fail.

Let's take a look at the multiline.

pattern => "^\S"
You're matching everything that starts with a non-whitespace character.

negate => true
You negate the above, so you're matching everything that starts with a whitespace character.

what => "next"
You combine the matched line (ie. the ones that start with a space) with the next one.

With the above, I assume your documents' message field will look like this:

[4428] Restart #34, SysUpTime 251d 16:46:50

01/09/2018 17:56:15 (t)

(R)STP topology change detected while (R)STP is off.


[4427] Restart #34, SysUpTime 251d 16:46:48

01/09/2018 17:56:13 (t)

(R)STP topology change detected while (R)STP is off.


[4426] Restart #34, SysUpTime 251d 16:46:46

01/09/2018 17:56:11 (t)

(R)STP topology change detected while (R)STP is off.

So basically, you combined every blank line with the next one. Other lines remained unaffected and are handled like individual documents.
You can check this by looking at the message field of the indexed documents.

How I would try it:

pattern => "^\[[0-9]+\]"
negate => true
what => "previous"

Pattern will match each line that starts with the event ID in brackets, eg. [4426]
Negate true indicates, that the matching line is actually the first line of the multiline text. (Ie. don't match those lines, so do not append them anywhere, lines will be appended to these.)
Previous tells Logstash to append matched lines (ie. those that don't start with an event ID) to the previous one.

This all will append each line that doesn't start with an event ID to the previous line, so you should receive documents like this:

[4428] Restart #34, SysUpTime 251d 16:46:50
01/09/2018 17:56:15 (t)
(R)STP topology change detected while (R)STP is off.

[4427] Restart #34, SysUpTime 251d 16:46:48
01/09/2018 17:56:13 (t)
(R)STP topology change detected while (R)STP is off.

etc.


(Thorsten Peter) #3

Thank you for the quick response.
I guess i searched 2 days in the wrong place... I thought the multiline input would work because i had as many results as blocks but i never thought about what u mentioned.
Anyhow now it works fine and i understand the funcionality of the multiline parameters much better.
Thank you very much and have a nice day :slight_smile:


(Attila Boncok) #4

You're welcome :slight_smile:
Don't forget to mark a solution if you found one, it helps other users browsing for solutions.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.