XML plugin parse

Hello,

I've been using the XML filtering plugin because I need to parse some XML data.

This is a simple example:

<task code="a01" status="wip"/>
<task code="a02" status="nwg"/>
<task code="a03" status="nwg">
     Description Line 1
     Description Line 2
     Description Line 3
     Description Line 4
</task>
<task code="a04" status="wip">
     <comment author="afusco">
         I've finished this part.
     </comment>
</task>

I'm trying to extract the code of these tasks

filter {
  xml {
    source => "message"
    store_xml => false
     xpath => [
       "/task/@code", "task_code",
       "/task/@status", "task_status",
     ]
  }
}

The thing is:

It's filtering correctly the lines that contains just a <task> tag. I can see the output is correct. But when it's processing the rest of the lines, for example, comment tags, it's parsing wrong.

To avoid it, I added the following simple condition just to drop the lines aren't task tags:

  if [message] !~ /^<task/ {
    drop { }
  }

But this is a workaround.

  • Exists any way to just parse the desired specific tags and at this way Elasticsearch doesn't receive also the undesired data? It could be good to drop the data if it's not in the xpath array.

Example of the output:

When it's a task tag:

{
    "path" => "/var/log/xml2.log",
    "@timestamp" => 2022-05-05T16:22:24.602Z,
    "@version" => "1",
    "task_code" => [
        [0] "a01"
    ],
          "task_status" => [
        [0] "wip"
    ],

When it's not:

{
      "@version" => "1",
    "@timestamp" => 2022-05-06T12:27:01.559Z,
          "host" => "elastic",
       "message" => "\t<comment author="afusco">",
          "path" => "/var/log/xml2.log"
}

Thanks.

Use a multiline codec on the input to consume an entire XML document as a single event.

Possibly

  codec => multiline {
      pattern => "<task"
      negate => "true"
      what => "previous"
      auto_flush_interval => 10
  }