Split XML

I have an XML doc being sent to a logstash server/ELK server.

This is the portion of the xml I'm trying to work with:

\<row refno="1" percent="100">
\<col>OMXEOC0\</col>\<col>100\</col>\<col>00E4\</col>
\</row>
\<row refno="2" percent="100">
\<col>PWXLST02\</col>\<col>100\</col>\<col>00D3\</col>
\</row>
\<row refno="3" percent="100">
\<col>OMXEDSST\</col>\<col>100\</col>\<col>0097\</col>
\</row>
\<row refno="4" percent="100">
\<col>BHA5947\</col>\<col>100\</col>\<col>026C\</col>
\</row>
\<row refno="5" percent="100">
\<col>MAN3333G\</col>\<col>100\</col>\<col>00F7\</col>
\</row>

^^^I had to use \ before each tag so that it wasn't read as actual XML^^^

What I want to do is split this single event into mutiple events each containing the information within each <row> tag (all the info in the row tag as well as teh embedded <col> tags).

So all of this would be in it's own separate message/event

\<row refno="5" percent="100">
\<col>MAN3333G\</col>\<col>100\</col>\<col>00F7\</col>
\</row>

This is my current code (which doens't work):

split {
      field => "[ddsml][report][row]"
}
xml {
      source => "message"
      target => "parsed"
      xpath => [
              "/ddsml/report/row[1]/col[1]/text()","Job",
              "/ddsml/report/row[1]/@percent","Percent_Delay"
      ]
}

Any idea how I can do that? I've asked before but the guy wasn't able to help. I've been working on this for a couple days now. I know it's possible. I've read everything about the split filter but that doesn't give any examples for XML.

From what I can tell split doesn't easily support xml.

If you have multiple values in the same message, which is what I think you're saying, then you could try parsing the xml then running the split over the results - as suggested in this post.

Just note that you can't run the xml filter on invalid xml, so you would have to wrap the row items in a containing tag before putting it through the xml plugin.

e.g.

add_field => {"newfield" => "<something>%{message}</something>"}

I know that my XML is valid, because I am able to parse out each item with the "row" tags and place them in a separate field. (I didn't include all of the xml in the screen shot because I didn't it was necessary). So that part works just fine.

But you're saying that add_field => {"newfield" => "<something>%{message}</something>"} is going to be INSIDE the XML plugin? I thought it would be within a Split plugin.

Does this help? This configuration

input { generator { count => 1 lines => [ '<rows>
<row refno="1" percent="100">
<col>OMXEOC0</col><col>100</col><col>00E4</col>
</row>
<row refno="2" percent="100">
<col>PWXLST02</col><col>100</col><col>00D3</col>
</row>
</rows>
' ] } }
filter {
    xml { source => "message" target => "parsed" remove_field => [ "message" ] }
    split { field => "[parsed][row]" }
}
output { stdout { codec => rubydebug { metadata => false } } }

will produce two events that look like

{
"@timestamp" => 2019-10-24T14:31:07.154Z,
[...]
    "parsed" => {
    "row" => {
            "col" => [
            [0] "OMXEOC0",
            [1] "100",
            [2] "00E4"
        ],
          "refno" => "1",
        "percent" => "100"
    }
}
}

The split has to come after the xml filter and reference the target.

I think that does, yes. At least it helps point me in a better direction.

When you say it "produces two events that look like:" do those two events have different col[0] values? Meaning one has OMXEOC0 and the other has PWXLST02? Or are both values the same?

If they are different then that's EXACTLY what I need.

Different. The other one has

    "row" => {
            "col" => [
            [0] "PWXLST02",
            [1] "100",
            [2] "00D3"
        ],
1 Like

Awesome. Then yes this helps. I'll tweak to my needs and run it. I'll let you know if that fully solves my issue. Thank you.

No - sorry - without the full picture of the xml I assumed that was the whole snippet. Looks like you have a solution well on the way with @Badger's input below. Good luck!

1 Like

When you specify split { field => "[parsed][row]" } is "[parsed][row]" just the XML path to row?

Or is [parsed] supposed to be the target, and then [row] is the field you want split?

Does that question^ make sense?

What I think is the answer is: split { field => " [path] [to] [field] " }

It may be clearer if you run it once without the split filter. With the xml filter target option set to parsed, that XML will result in an event that has a [parsed] field that contains an array called [rows]. So the array you want to split is [parsed][rows]

Ok, you're right that makes way more sense. However, when my target option is set to "parsed". I get two different fields, one called "parsed.report" (which has that array but also other arrays with different info) and one called "parsed.server".

So my code looks like this split { field => "[parsed.report][row]" } but I get a split type error. Any thoughts on that?

I think you will find it is [parsed][report], so you want to split [parsed][report][row]

1 Like

THANK YOU.

I had to split the parsed.report field first, then split one of the resulting fields which was parsed.report.row.

So this is the code that solves my problem:

split { field => "[parsed][report]" }
split { field => "[parsed][report][row]"}

Thank you both for all the help! I've been trying this for so long, I almost broke my brain.

Successfully made an HTTP request from logstash to a z/OS Mainframe's RMF Distributed Data Server for CPU information in ELK. <--- In case anyone in the future needs to know how to do this.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.