Parse XML log lines

Boukhdhira · June 25, 2018, 2:28pm

I would like some recommendation on how to parse a given xml document splited into log lines with logstash.
My document looks like this:

|2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<root xmlns="http://xxxxxxxxxxx/5.0"> |
|---|---|
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<file-version>2.2</file-version> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<generation-date>2018-06-19T07:27:48.900+02:00</generation-date> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<report> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<status> |
| 2018-06-19T07:07:24+02:00 127.0.0.1 ping  -  |<info id="PushOK" type="number">44</info> |
| 2018-06-19T07:07:24+02:00 127.0.0.1 ping  -  |<info id="PushFailure" type="number">0</info> |
| 2018-06-19T07:07:24+02:00 127.0.0.1 ping  -  |</status> |
| 2018-06-19T07:07:24+02:00 127.0.0.1 ping  -  |<task exec="2018-06-19T06:05:00.000+02:00" id="XXXXXXXXX_06_2018"> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<transaction id="1" type="dfzlmsi" start="2018-06-19T06:27:00.000+02:00" stop="2018-06-19T06:27:00.000+02:00" retry="0" status="failed" reason="lost device"/> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |</target> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<target id="xxxxx8514" type="X1"> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<transaction id="1" type="dfzlmsi" start="2018-06-19T06:27:00.000+02:00" stop="2018-06-19T06:27:00.000+02:00" retry="0" status="failed" reason="lost device"/> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |</target> |
| 2018-06-19T07:07:32+02:00 127.0.0.1 ping  -  |<taskStatus ko="290" ok="24" status="partially_failed"/> |
| 2018-06-19T07:07:32+02:00 127.0.0.1 ping  -  |</task> |
| 2018-06-19T07:07:32+02:00 127.0.0.1 ping  -  |</report> |
| 2018-06-19T07:07:32+02:00 127.0.0.1 ping  -  |</root>|

And my goal is to agregate xml report and extract data from that.

magnusbaeck · June 25, 2018, 2:49pm

If you format your log example as preformatted text using markdown notation or the </> toolbar button we'll actually be able to see what it looks like.

Boukhdhira · June 26, 2018, 9:07am

Thank you for your answer. I update my post.
My question: is it possible to apply aggregate filter to build a valid xml output and prosses it using xml filter. if it's could you please give me some exemple.

thank you in advance.
PS: | is a tabulation \t.

magnusbaeck · June 26, 2018, 9:20am

I don't know if an aggregate filter would be the best option here. I've never used it. I'd probably use a multiline codec to join all log entries into a single event and then use a ruby filter to chop it up and remove the non-XML data from everything but the first line.

How do you know that a sequence of log entries like this one won't ever interlace with each other if they're logged at the same time?

Boukhdhira · June 26, 2018, 9:51am

i launch logstash using one thread worker
`

logstash --pipeline.workers 1 -f logstash.conf

`

magnusbaeck · June 26, 2018, 10:58am

I meant on the logging end. Are all lines for a given XML document guaranteed to be logged atomically with no chance of any other messages slipping inbetween? If yes, why are the timestamps different in the example above?

Boukhdhira · June 26, 2018, 11:23am

there is no such risk

system · July 24, 2018, 11:24am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Using logstash Logstash	8	1821	July 6, 2017
Trying to parse XML-logs Logstash	11	2970	June 25, 2018
Problem in parsing XML files using logstash? Logstash	6	3747	July 18, 2017
Problems with parsing XML log files with XML filter of Logstash Logstash	8	1618	July 6, 2017
How to parse xml from a single line Logstash	7	2542	June 21, 2018

Parse XML log lines

Related topics