Hi I have a very tricky situation with two types of multiline messages in one file and of course no application support to change it
Example 1:
20170930 14:20:01.003 message 1 line 1
message 1 line 2
20170930 14:20:01.004 message 2
Example 2:
20170930 14:23:04.013 message 1 line 1
20170930 14:23:04.014 message 1 executed in 99ms
We need two different multiline patterns. One has to be negated (every line that doesn´t look like message 1 line 1 should be a multiline) and the other one should not be negated (every line that consists of "excuted in 99 ms" should be a multiline aswell). Is there a way to realize that? Easiest way would be one negate pattern and one nonegate pattern.
Thank you for your answer! Unfortunateley I think that it is not working like this. I am using the following pattern: ^[^20[0-9]{2}(0[1-9]|1[0-2])([0-2][0-9]|3[0-1])\s([0-1][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9].[0-9]{3}\s(message)\s\d+\s(executed in)\s\d+(ms)]|^20[0-9]{2}(0[1-9]|1[0-2])([0-2][0-9]|3[0-1])\s([0-1][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9].[0-9]{3}\s(message)\s\d+\s(line)\s\d+
But it doesn´t match the following line: 20170930 14:23:04.014 message 1 executed in 99ms
It matches all the lines that do not look like this line: 20170930 14:20:01.003 message 1 line 1
Indeed quite tricky. The [^...] clause does not negate a pattern, but negates a character class.
My attempt uses the pattern (^[^\d]{8}|executed in) with negate false. This regex checks the log line is not starting with 8 digits (the date) or does contain the substring executed in. See the playground: https://play.golang.org/p/0TcDlVySS3
You can add any character you like I think. But the more characters you add to the character class, the higher the chance of false positives. What's the issue?
First of all it seems you have a mix of quite some patterns in your log-file (yeah). Having all patterns in the playground in order to test and play with it, helps in creating a more robust regex (if possible). Please add a bigger corpus potential log message.
Is it always 2 numbers followed of 3 digits or are there other 'similar' patterns.
I kind of see what you are trying with your pattern. You try to negate a complete sequence, by negating the individual character classes. This is not possible, as one can not negate complete terms with regular expressions. Give your sample the closes I can think of is ^[\d ]{7}. But this matches any sequence of digits and spaces, taking a total of 7 bytes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.