Parsing multiline logs : line + xml


(carmelo) #1

Below an example of my logs indent preformatted text by 4 spaces
INFO 05-01-16 08:06:01 [http-nio-8080-exec-8] (AbstractServer.java:454) - <dialogue> <server>localhost</server> <duration>311</duration> [...] </dialogue>

I was able to parse the xml by xpath, but I don't know how can I parse the first line of the logs.
logstash is considering the two line independently.

this is my filter:
if [type] == "oldLogs" { grok { patterns_dir => "./patterns" match => { "message" => "%{LOGLEVEL:level} %{TIMESTAMP_ISO8601:timestamp} \[%{IP:client}\] \(%{JAVACLASS:class}\) - %{GREEDYDATA:msg}" } } multiline { pattern => "<dialogue>" what => "previous" negate => "true" } xml { store_xml => "false" source => "message" xpath =>[ "/dialogue/server/text()" ,"server", "/dialogue/duration/text()" ,"duration" ] } }

I hope someone can help me.

Regards
Carmelo


Parse xml content to elasticsearch
(Magnus Bäck) #2

Filters are processed in order, so you'll want to join all logical lines to a single event first, then use grok and xml filters to parse that line.

Secondly, the multiline condition won't work. This should work better (although a bit sloppy but I'm too lazy to write a stricter regexp):

multiline {
  pattern => "^%{LOGLEVEL} "
  what => "previous"
  negate => "true"
}

After that, the grok filter will extract the XML payload from the event to the msg field which you'll then process with the xml filter.


(carmelo) #3

Hi Magnus,
thank you for you time.

I changed my multiline and now I logstash is reading the logs all in one line.

But grok and xml filter are not filtering the fields.

this is my new version, could you give to me some tips, please :
if [type] == "oldLogs" { multiline { pattern => "^%{LOGLEVEL} " what => "previous" negate => "true" } grok { patterns_dir => "./patterns" match => { "message" => "%{LOGLEVEL:level} %{TIMESTAMP_ISO8601:timestamp} \[%{IP:client}\] \(%{JAVACLASS:class}\) - %{GREEDYDATA:msg}" } } xml { store_xml => "false" source => "msg" xpath =>[ "/dialogue/server/text()" ,"server", "/dialogue/duration/text()" ,"duration" ] } }


(Magnus Bäck) #4

Your grok expression doesn't match the input: "http-nio-8080-exec-8" isn't an IP address so you can't use the IP grok pattern, and while "AbstractServer.java" should pass as a Java class name you're ignoring the line number that follows.

See also Logstash multiline Bug regarding grokking multiline strings.


(carmelo) #5

Thank you very much for you time.

It is working now.

Regards
Carmelo


(carmelo) #6

Sorry if I disturb you gain,
I is working quiet fine, grok and xml are parsing the line, but I am using your tips to create a single line.
Now my logs become:
INFO 05-01-16 08:06:01 [http-nio-8080-exec-8] (AbstractServer.java:454) - \n<dialogue>\n<server>localhost</server>\n<duration>311</duration>\n[...]\n</dialogue>\nINFO 05-01-16 08:06:01 [http-nio-8080-exec-8] (AbstractServer.java:454) - \n<dialogue>\n<server>localhost</server>\n<duration>311</duration>\n[...]\n</dialogue>\n[...]

and I don't know how define the "end line", because above I have 2 different logs.

My filter is:
if [type] == "oldLogs" { multiline { pattern => "^%{LOGLEVEL}" what => "previous" negate => "true" } grok { patterns_dir => "./patterns" match => ["message", "(?m)%{LOGLEVEL:level} %{TIMESTAMP_ISO8601:timestamp} \[%{PROG:msg_1}\] \(%{JAVAFILE:file}:%{NUMBER:line}\) \-%{GREEDYDATA:msg_3}"] } xml { store_xml => "false" source => "msg_3" xpath =>[ "/dialogue/server/text()" ,"server", "/dialogue/duration/text()" ,"duration", [...] ] } }

Regards
Carmelo


(Magnus Bäck) #7

Sorry, I don't understand the question. What's the problem?


(carmelo) #8

Goog morning Magnus, I try to explain better:
I have this multiline log souce.
Below I am showing a file with 3 logs, it is starting with "INFO" and finish with "</dialogue>":
INFO 05-01-16 08:06:01 [http-nio-8080-exec-8] (AbstractServer.java:454) - <dialogue> <server>FirstLog</server> <duration>311</duration> [...] </dialogue> INFO 05-01-16 08:06:02 [http-nio-8080-exec-8] (AbstractServer.java:454) - <dialogue> <server>SecondLog</server> <duration>500</duration> [...] </dialogue> INFO 05-01-16 08:06:03 [http-nio-8080-exec-8] (AbstractServer.java:454) - <dialogue> <server>ThirdLog</server> <duration>100</duration> [...] </dialogue>
I modified my filter with your tips, then I am able to parse the JAVA logs and the xml.
But with my filters (post above) logstash is not able to undertand where is the end of my logs .
The output look like this :
"message" => <dialogue>\n<server>FirstLog</server>\n<duration>311</duration>\n[...]\n</dialogue>\nINFO 05-01-16 08:06:02 [http-nio-8080-exec-8] (AbstractServer.java:454) -\n<dialogue>\n<server>SecondLog</server>\n<duration>500</duration>\n[...]\n</dialogue>\nINFO 05-01-16 08:06:03 [http-nio-8080-exec-8] (AbstractServer.java:454) -\n<dialogue>\n<server>ThirdLog</server>\n<duration>100</duration>\n[...]\n</dialogue> "level" => "INFO", "timestamp" => "05-01-16 08:06:01", "msg_1" => "http-nio-8080-exec-8", "file" => "AbstractServer.java", "xmldata" => <dialogue>\n<server>FirstLog</server>\n<duration>311</duration>\n[...]\n</dialogue>\nINFO 05-01-16 08:06:02 [http-nio-8080-exec-8] (AbstractServer.java:454) -\n<dialogue>\n<server>SecondLog</server>\n<duration>500</duration>\n[...]\n</dialogue>\nINFO 05-01-16 08:06:03 [http-nio-8080-exec-8] (AbstractServer.java:454) -\n<dialogue>\n<server>ThirdLog</server>\n<duration>100</duration>\n[...]\n</dialogue> "server" => [ [0] "FirstLog" ], "duration" => [ [0] "311"

and logstash is parsing only the first xml log and is not considering the other 2.
My final result should be:

{"message" => <dialogue>\n<server>FirstLog</server>\n<duration>311</duration>\n[...]\n</dialogue>\n "level" => "INFO", "timestamp" => "05-01-16 08:06:01", "msg_1" => "http-nio-8080-exec-8", "file" => "AbstractServer.java", "xmldata" => <dialogue>\n<server>FirstLog</server>\n<duration>311</duration>\n[...]\n</dialogue>\n "server" => [ [0] "FirstLog" ], "duration" => [ [0] "311" } {"message" => <dialogue>\n<server>SecondLog</server>\n<duration>500</duration>\n[...]\n</dialogue>\n "level" => "INFO", "timestamp" => "05-01-16 08:06:02", "msg_1" => "http-nio-8080-exec-8", "file" => "AbstractServer.java", "xmldata" => <dialogue>\n<server>SecondLog</server>\n<duration>500</duration>\n[...]\n</dialogue>\n "server" => [ [0] "SecondLog" ], "duration" => [ [0] "500" } {"message" => <dialogue>\n<server>ThirdLog</server>\n<duration>100</duration>\n[...]\n</dialogue>\n "level" => "INFO", "timestamp" => "05-01-16 08:06:03", "msg_1" => "http-nio-8080-exec-8", "file" => "AbstractServer.java", "xmldata" => <dialogue>\n<server>ThirdLog</server>\n<duration>100</duration>\n[...]\n</dialogue>\n "server" => [ [0] "ThirdLog" ], "duration" => [ [0] "100" }

I hope this is more clear and I hope you have time to give me some more tips.

Regards
Carmelo


(Magnus Bäck) #9

Okay, I get it. I'm not sure why it doesn't pick up the second event, but that it doesn't pick up the last event in a file is actually expected. There's been some work to fix that but I think it's still a problem.


(carmelo) #10

Thank you for your time,

I've got an idea,
can I get all logs line from the and of the log </dialogue> to the topINFO and then parse it ?

If yes how ?


(system) #11