Parse XML log lines


(Boukhdhira) #1

I would like some recommendation on how to parse a given xml document splited into log lines with logstash.
My document looks like this:

|2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<root xmlns="http://xxxxxxxxxxx/5.0"> |
|---|---|
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<file-version>2.2</file-version> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<generation-date>2018-06-19T07:27:48.900+02:00</generation-date> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<report> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<status> |
| 2018-06-19T07:07:24+02:00 127.0.0.1 ping  -  |<info id="PushOK" type="number">44</info> |
| 2018-06-19T07:07:24+02:00 127.0.0.1 ping  -  |<info id="PushFailure" type="number">0</info> |
| 2018-06-19T07:07:24+02:00 127.0.0.1 ping  -  |</status> |
| 2018-06-19T07:07:24+02:00 127.0.0.1 ping  -  |<task exec="2018-06-19T06:05:00.000+02:00" id="XXXXXXXXX_06_2018"> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<transaction id="1" type="dfzlmsi" start="2018-06-19T06:27:00.000+02:00" stop="2018-06-19T06:27:00.000+02:00" retry="0" status="failed" reason="lost device"/> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |</target> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<target id="xxxxx8514" type="X1"> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |<transaction id="1" type="dfzlmsi" start="2018-06-19T06:27:00.000+02:00" stop="2018-06-19T06:27:00.000+02:00" retry="0" status="failed" reason="lost device"/> |
| 2018-06-19T07:29:09+02:00 127.0.0.1 ping  -  |</target> |
| 2018-06-19T07:07:32+02:00 127.0.0.1 ping  -  |<taskStatus ko="290" ok="24" status="partially_failed"/> |
| 2018-06-19T07:07:32+02:00 127.0.0.1 ping  -  |</task> |
| 2018-06-19T07:07:32+02:00 127.0.0.1 ping  -  |</report> |
| 2018-06-19T07:07:32+02:00 127.0.0.1 ping  -  |</root>|

And my goal is to agregate xml report and extract data from that.


(Magnus Bäck) #2

If you format your log example as preformatted text using markdown notation or the </> toolbar button we'll actually be able to see what it looks like.


(Boukhdhira) #3

Thank you for your answer. I update my post.
My question: is it possible to apply aggregate filter to build a valid xml output and prosses it using xml filter. if it's could you please give me some exemple.

thank you in advance.
PS: | is a tabulation \t.


(Magnus Bäck) #4

I don't know if an aggregate filter would be the best option here. I've never used it. I'd probably use a multiline codec to join all log entries into a single event and then use a ruby filter to chop it up and remove the non-XML data from everything but the first line.

How do you know that a sequence of log entries like this one won't ever interlace with each other if they're logged at the same time?


(Boukhdhira) #5

i launch logstash using one thread worker
`

logstash --pipeline.workers 1 -f logstash.conf

`


(Magnus Bäck) #6

I meant on the logging end. Are all lines for a given XML document guaranteed to be logged atomically with no chance of any other messages slipping inbetween? If yes, why are the timestamps different in the example above?


(Boukhdhira) #7

there is no such risk


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.