Logstash XML Filter Plugin - XML Parsing General Question

Your new example XML has six lines. A file input like

file { path => "/home/user/foo.txt" sincedb_path => "/dev/null" start_position => beginning }

will consume that as six separate events. For the second one:

<AV9APIDATA xmlns="av9api-platform-com">

the xml filter with complain ":exception=>#<REXML::ParseException: No close tag for /AV9APIDATA". That's because the closing /AV9APIDATA tag is in the sixth event, not the second.

You need to use a multiline codec to consume the entire XML document as a single event. For example, if you need to consume the entire file as one event you could use

file {
    path => "/home/user/foo.txt"
    sincedb_path => "/dev/null"
    start_position => beginning 
    codec => multiline { 
        pattern => "^Spalanzani" 
        negate => true 
        what => previous 
        auto_flush_interval => 2
    }
}

If you do that then the xml filter will parse it just fine.

Note, if you have two XML documents in a file, for example

<?xml version="1.0" encoding="utf-16"?>
<AV9APIDATA xmlns="av9api-platform-com"> <ORDER EngineID="2"> </ORDER>
</AV9APIDATA>
<?xml version="1.0" encoding="utf-16"?>
<AV9APIDATA xmlns="av9api-platform-com"> <ORDER EngineID="3"> </ORDER>
</AV9APIDATA>

then you will get a different exception: attempted adding second root element to document.

In that case, use a different pattern to consume documents

codec => multiline { 
    pattern => "^</" 
    negate => true 
    what => next  # Note previous changed to next
    auto_flush_interval => 2 
}

That will work provided that your XML is pretty-printed with indentation. If you have nested elements that are left aligned then it will break and you may have to resort to something like

codec => multiline { 
    pattern => "^</AV9APIDATA" 
    negate => true 
    what => next  # Note previous changed to next
    auto_flush_interval => 2 
}

which provides very little flexibility.

1 Like