Tricky Multiline pattern/configuration


#1

Hi I have a very tricky situation with two types of multiline messages in one file and of course no application support to change it :slight_smile:

Example 1:
20170930 14:20:01.003 message 1 line 1
message 1 line 2 
20170930 14:20:01.004 message 2

Example 2:
20170930 14:23:04.013 message 1 line 1
20170930 14:23:04.014 message 1 executed in 99ms

We need two different multiline patterns. One has to be negated (every line that doesn´t look like message 1 line 1 should be a multiline) and the other one should not be negated (every line that consists of "excuted in 99 ms" should be a multiline aswell). Is there a way to realize that? Easiest way would be one negate pattern and one nonegate pattern.

Thanks a lot for your help!


(Noémi Ványi) #2

Hi!

You can specify two multiline patterns by connecting them using | which stands for or.
So you could write '[^negated-pattern]|pattern'. You can test you multiline regex using this Go Playground: https://play.golang.org/p/uAd5XHxscu
More info on how to test it: https://www.elastic.co/guide/en/beats/filebeat/current/multiline-examples.html#_testing_your_regexp_pattern_for_multiline

Let us know, if you need further help.


#4

Hi!

Thank you for your answer! Unfortunateley I think that it is not working like this. I am using the following pattern:
^[^20[0-9]{2}(0[1-9]|1[0-2])([0-2][0-9]|3[0-1])\s([0-1][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9].[0-9]{3}\s(message)\s\d+\s(executed in)\s\d+(ms)]|^20[0-9]{2}(0[1-9]|1[0-2])([0-2][0-9]|3[0-1])\s([0-1][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9].[0-9]{3}\s(message)\s\d+\s(line)\s\d+

But it doesn´t match the following line:
20170930 14:23:04.014 message 1 executed in 99ms

It matches all the lines that do not look like this line:
20170930 14:20:01.003 message 1 line 1

Here´s is the link to the Go Playground: https://play.golang.org/p/Vo2AJjPzaN

Thanks a lot for your help!


(Steffen Siering) #5

Indeed quite tricky. The [^...] clause does not negate a pattern, but negates a character class.

My attempt uses the pattern (^[^\d]{8}|executed in) with negate false. This regex checks the log line is not starting with 8 digits (the date) or does contain the substring executed in. See the playground: https://play.golang.org/p/0TcDlVySS3


#6

Hi!

I think I got it working! Thank you for your help :grimacing:


#7

Hi again!

Is it possible to use more than one negated class?


(Steffen Siering) #8

You can add any character you like I think. But the more characters you add to the character class, the higher the chance of false positives. What's the issue?


#9

For example, if I use a pattern like this(var negate = true):
(^[^\d]{3}[^\s][^\d]{3})
The following lines should be matched:

200 300

But these are matched aswell for example:

200
asdf
2603
1

What is wrong with my pattern? Link to playground: https://play.golang.org/p/9J9vZqvzQz


(Steffen Siering) #10

Hmm... I don't fully get it.

First of all it seems you have a mix of quite some patterns in your log-file (yeah). Having all patterns in the playground in order to test and play with it, helps in creating a more robust regex (if possible). Please add a bigger corpus potential log message.

Is it always 2 numbers followed of 3 digits or are there other 'similar' patterns.

I kind of see what you are trying with your pattern. You try to negate a complete sequence, by negating the individual character classes. This is not possible, as one can not negate complete terms with regular expressions. Give your sample the closes I can think of is ^[\d ]{7}. But this matches any sequence of digits and spaces, taking a total of 7 bytes.


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.