XML File: multiple lines that not share a root-tag


(Timothy Eichmann) #1

My problem is that I am reading log files with XML in it. BUT........ it looks like this:

<exclusive-start id="51" timestamp="2018-09-18T21:28:08.474" intervalms="4907.088">
  <response-info timems="0.023" idlems="0.023" threads="0" lastid="0000000001010500" lastname="main" />
</exclusive-start>
<af-start id="52" totalBytesRequested="45312" timestamp="2018-09-18T21:28:08.474" intervalms="6529.630" />
<cycle-start id="53" type="scavenge" contextid="0" timestamp="2018-09-18T21:28:08.474" intervalms="6529.633" />
<gc-start id="54" type="scavenge" contextid="53" timestamp="2018-09-18T21:28:08.474">
  <mem-info id="55" free="9387749200" total="10737418240" percent="87">
    <mem type="nursery" free="0" total="1342177280" percent="0">
      <mem type="allocate" free="0" total="671088640" percent="0" />
      <mem type="survivor" free="0" total="671088640" percent="0" />
    </mem>
    <mem type="tenure" free="9387749200" total="9395240960" percent="99">
      <mem type="soa" free="8917987152" total="8925478912" percent="99" />
      <mem type="loa" free="469762048" total="469762048" percent="100" />
    </mem>
    <remembered-set count="46355" />
  </mem-info>
</gc-start>
...
<allocation-stats totalBytes="623062072" >
...
<gc-end id="57" type="scavenge" contextid="53" durationms="48.150" usertimems="171.875" systemtimems="0.000" timestamp="2018-09-18T21:28:08.521" activeThreads="4">
...
</gc-end>
...
<exclusive-end id="62" timestamp="2018-09-18T21:28:08.521" durationms="51.568" />

<exclusive-start id=".....

At the end, it just start over again with new data.

As you can see, there is a lot of data that belongs in 1 "document". But the XML isn't grouped in a tag. It's just separated with a blank line.

I have set it up in Filebeat to deal with the multi-line. But when I ship it over to Logstash and want to use the XML-filter, I run into problems because there is no root-tag around those tags that belong together. So using xpath in the filter is not working since there is no root to work with.

I tried to manually add a dummy tag around everything and let it go through the workflow and then it just works fine, and I get all the data nicely in separate fields in elasticsearch.

So...... the big question is if this is possible somehow with Filebeat/Logstash? To add this dummy tag?

PS: Before you ask, no I cannot change the way this XML gets logged sadly, that would just solve everything....


(Timothy Eichmann) #2

OK, I solved it myself now. And it goes like this:

Filebeat fixes the multiline, so it "groups" the tags that belong together in 1 message and ships it to Logstash.

Logstash filter solves the rest:

mutate { replace => { "message" => "<dummy>%{message}</dummy>" } }

Now I can run the XML filter with xpath:

xml {
    xpath => [
        "dummy/exclusive-start/@timestamp", "logTimeStamp",
        "dummy/......
    ]
}

This creates now all data nicely in different fields for one event.


(system) closed #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.