Parse XML log lines

I would like some recommendation on how to parse a given xml document splited into log lines with logstash.
My document looks like this:

|2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<root xmlns="http://xxxxxxxxxxx/5.0"> |
|---|---|
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<file-version>2.2</file-version> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<generation-date>2018-06-19T07:27:48.900+02:00</generation-date> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<report> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<status> |
| 2018-06-19T07:07:24+02:00 127.0.0.1 ping  -  |<info id="PushOK" type="number">44</info> |
| 2018-06-19T07:07:24+02:00 127.0.0.1 ping  -  |<info id="PushFailure" type="number">0</info> |
| 2018-06-19T07:07:24+02:00 127.0.0.1 ping  -  |</status> |
| 2018-06-19T07:07:24+02:00 127.0.0.1 ping  -  |<task exec="2018-06-19T06:05:00.000+02:00" id="XXXXXXXXX_06_2018"> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<transaction id="1" type="dfzlmsi" start="2018-06-19T06:27:00.000+02:00" stop="2018-06-19T06:27:00.000+02:00" retry="0" status="failed" reason="lost device"/> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |</target> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<target id="xxxxx8514" type="X1"> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<transaction id="1" type="dfzlmsi" start="2018-06-19T06:27:00.000+02:00" stop="2018-06-19T06:27:00.000+02:00" retry="0" status="failed" reason="lost device"/> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |</target> |
| 2018-06-19T07:07:32+02:00 127.0.0.1 ping  -  |<taskStatus ko="290" ok="24" status="partially_failed"/> |
| 2018-06-19T07:07:32+02:00 127.0.0.1 ping  -  |</task> |
| 2018-06-19T07:07:32+02:00 127.0.0.1 ping  -  |</report> |
| 2018-06-19T07:07:32+02:00 127.0.0.1 ping  -  |</root>|

And my goal is to agregate xml report and extract data from that.

If you format your log example as preformatted text using markdown notation or the </> toolbar button we'll actually be able to see what it looks like.

Thank you for your answer. I update my post.
My question: is it possible to apply aggregate filter to build a valid xml output and prosses it using xml filter. if it's could you please give me some exemple.

thank you in advance.
PS: | is a tabulation \t.

I don't know if an aggregate filter would be the best option here. I've never used it. I'd probably use a multiline codec to join all log entries into a single event and then use a ruby filter to chop it up and remove the non-XML data from everything but the first line.

How do you know that a sequence of log entries like this one won't ever interlace with each other if they're logged at the same time?

i launch logstash using one thread worker
`

logstash --pipeline.workers 1 -f logstash.conf

`

I meant on the logging end. Are all lines for a given XML document guaranteed to be logged atomically with no chance of any other messages slipping inbetween? If yes, why are the timestamps different in the example above?

there is no such risk

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.