Console XML works, but not file XML

We have written a logstash script to process an XMLfile and export contents to stdout{}. We have verified the script works when the input is specified as stdin{}

However, when we specify a file as the input, the XML does not process as expected and we get errors like: Error parsing xml with XmlSimple

Config file contents are:

input {
file {
path => "D:/logstash-7.4.1/conf/geoserver_audit_20191030_2.log"
sincedb_path => "nul"
start_position => "beginning"
type => "xml"
codec => multiline {
pattern => "Request>"
negate => true
what => "previous"
}
}
}

filter {
xml
{
source => "message"
xpath => [
"/Requests/Request/Service/text()", "service",
"/Requests/Request/Version/text()", "version",
"/Requests/Request/Operation/text()", "operation",
"/Requests/Request/SubOperation/text()", "sub_operation",
"/Requests/Request/Resources/text()", "resources",
"/Requests/Request/Path/text()", "path",
"/Requests/Request/QueryString/text()", "query_string",
"/Requests/Request/Body/text()", "body",
"/Requests/Request/HttpMethod/text()", "http_method",
"/Requests/Request/StartTime/text()", "start_time",
"/Requests/Request/EndTime/text()", "end_time",
"/Requests/Request/TotalTime/text()", "total_time",
"/Requests/Request/RemoteAddr/text()", "remote_addr",
"/Requests/Request/RemoteHost/text()", "remote_host",
"/Requests/Request/Host/text()", "host",
"/Requests/Request/RemoteUser/text()", "remote_host",
"/Requests/Request/ResponseStatus/text()", "response_status",
"/Requests/Request/ResponseLength/text()", "response_length",
"/Requests/Request/ResponseContentType/text()", "response_content_type",
"/Requests/Request/CacheResult/text()", "cache_result",
"/Requests/Request/MissReason/text()", "miss_reason",
"/Requests/Request/Failed/text()", "failed"
]
target => "xml_value"
}
#mutate {
#remove_field => [tags, host, message, xml_value]
#}
}

output {
stdout {}
}

XML contents are:
&gt?xml version="1.0" encoding="UTF-8" ?>
&gtRequests>&gtRequest id="34">
&gtService>&gt/Service>
&gtVersion>&gt/Version>
&gtOperation>&gt/Operation>
&gtSubOperation>&gt/SubOperation>
&gtResources>&gt/Resources>
&gtPath>null&gt/Path>
&gtQueryString>&gt/QueryString>
&gtBody>

&gt/Body>
&gtHttpMethod>GET&gt/HttpMethod>
&gtStartTime>2019-10-30T22:50:39.107Z&gt/StartTime>
&gtEndTime>2019-10-30T22:50:43.55Z&gt/EndTime>
&gtTotalTime>4,443&gt/TotalTime>
&gtRemoteAddr>10.2.142.1&gt/RemoteAddr>
&gtRemoteHost>10.2.142.1&gt/RemoteHost>
&gtHost>dev-alta-geo04.altalisdev.com&gt/Host>
&gtRemoteUser>anonymous&gt/RemoteUser>
&gtResponseStatus>200&gt/ResponseStatus>
&gtResponseLength>0&gt/ResponseLength>
&gtResponseContentType>&gt/ResponseContentType>
&gtCacheResult>&gt/CacheResult>
&gtMissReason>&gt/MissReason>
&gtFailed>false&gt/Failed>
&gt/Request>
&gtRequest id="35">
&gtService>&gt/Service>
&gtVersion>&gt/Version>
&gtOperation>&gt/Operation>
&gtSubOperation>&gt/SubOperation>
&gtResources>&gt/Resources>
&gtPath>/index.html&gt/Path>
&gtQueryString>&gt/QueryString>
&gtBody>

&gt/Body>
&gtHttpMethod>GET&gt/HttpMethod>
&gtStartTime>2019-10-30T22:50:43.581Z&gt/StartTime>
&gtEndTime>2019-10-30T22:50:43.612Z&gt/EndTime>
&gtTotalTime>31&gt/TotalTime>
&gtRemoteAddr>10.2.142.1&gt/RemoteAddr>
&gtRemoteHost>10.2.142.1&gt/RemoteHost>
&gtHost>dev-alta-geo04.altalisdev.com&gt/Host>
&gtRemoteUser>anonymous&gt/RemoteUser>
&gtResponseStatus>200&gt/ResponseStatus>
&gtResponseLength>697&gt/ResponseLength>
&gtResponseContentType>text/html;charset=UTF-8&gt/ResponseContentType>
&gtCacheResult>&gt/CacheResult>
&gtMissReason>&gt/MissReason>
&gtFailed>false&gt/Failed>
&gt/Request>
&gt/Requests>

Please edit your post and use markdown so that the XML is readable.

Post has been edited to make XML more readable

The problem is your multiline codec. It is not combining lines into valid XML. The first messages are

<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n<Requests><Request id=\"34\">...<Failed>false</Failed>
</Request>\n<Request id=\"35\">...<Failed>false</Failed>

When using target the xml filter will actually parse the first one, but the xpath option is completely unforgiving of any junk around the XML.

Change your codec to use

pattern => "<Request "
negate => true
what => "previous"

Then you need some filters to clean up the results

if [message] =~ /<\?xml/ { drop {} }
mutate { gsub => [ "message", "<(/)?Requests>", "" ] }

That leaves you with valid XML. You have some typos in your xpath expressions you will need to clean.

I actually would not do it using xpath. I don't like that everything is an array. Instead I would use

target => "xml_value"
store_xml => true
force_array => false

Now I realize you want to everything to have different names, but to get the names you want you basically add an underscore before an uppercase letter (except at start of key name) then fold to lowercase. That is easy enough to do in ruby...

ruby {
    code => '
        x = event.get("xml_value")
        if x
            x.each { |k, v|
                newk = k.gsub(/(?!^)([A-Z])/, "_\\1")
                newk = newk.downcase
                event.remove("[xml_value][#{k}]")
                event.set("[xml_value][#{newk}]", v)
            }
        end
    '
}

If you are new to Ruby regexps, (?!^) is a negative lookahead assertion which means do not match when ^ is at the start of the pattern. So this pattern only matches inside a key value, not at its start.

We have implemented the changes as outlined and we are now getting error: :exception=>#<REXML::ParseException: No close tag for /Requests

The updated config file is:
input {
file {
path => "D:/logstash-7.4.1/conf/geoserver_audit_20191030_2.log"
sincedb_path => "nul"
start_position => "beginning"
type => "xml"
codec => multiline {
pattern => "<Request "
negate => true
what => "previous"
}
}
}

filter {
xml
{
source => "message"
target => "xml_value"
store_xml => true
force_array => false
}
if [message] =~ /<?xml/ { drop {} }
mutate { gsub => [ "message", "<(/)?Requests>", "" ] }
ruby {
code => '
x = event.get("xml_value")
if x
x.each { |k, v|
newk = k.gsub(/(?!^)([A-Z])/, "_\1")
newk = newk.downcase
event.remove("[xml_value][#{k}]")
event.set("[xml_value][#{newk}]", v)
}
end
'
}
}

output {
stdout {}
}

The full error message is:
[2019-11-06T09:43:54,577][WARN ][logstash.filters.xml ][main] Error parsing xml with XmlSimple {:source=>"message", :value=>"<?xml version=\"1.0\" encoding=\"UTF-8\" ?>", :exception=>#<NoMethodError: undefined method attributes' for nil:NilClass>, :backtrace=>["D:/logstash-7.4.1/vendor/bundle/jruby/2.5.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:720:in get_attributes'", "D:/logstash-7.4.1/vendor/bundle/jruby/2.5.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:464:in collapse'", "D:/logstash-7.4.1/vendor/bundle/jruby/2.5.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:194:in xml_in'", "D:/logstash-7.4.1/vendor/bundle/jruby/2.5.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:203:in xml_in'", "D:/logstash-7.4.1/vendor/bundle/jruby/2.5.0/gems/logstash-filter-xml-4.0.7/lib/logstash/filters/xml.rb:185:in filter'", "D:/logstash-7.4.1/logstash-core/lib/logstash/filters/base.rb:143:in do_filter'", "D:/logstash-7.4.1/logstash-core/lib/logstash/filters/base.rb:162:in block in multi_filter'", "org/jruby/RubyArray.java:1800:in each'", "D:/logstash-7.4.1/logstash-core/lib/logstash/filters/base.rb:159:in multi_filter'", "org/logstash/config/ir/compiler/AbstractFilterDelegatorExt.java:115:in multi_filter'", "D:/logstash-7.4.1/logstash-core/lib/logstash/java_pipeline.rb:243:in block in start_workers'"]}
[2019-11-06T09:43:54,718][WARN ][logstash.filters.xml ][main] Error parsing xml with XmlSimple {:source=>"message", :value=>"<Request id="34">\n \n \n \n \n \n null\n \n \n \n \n GET\n 2019-10-30T22:50:39.107Z\n 2019-10-30T22:50:43.55Z\n 4,443\n 10.2.142.1\n 10.2.142.1\n dev-alta-geo04.altalisdev.com\n anonymous\n 200\n 0\n \n \n \n false\n", :exception=>#<REXML::ParseException: No close tag for /Requests
Line: 26
Position: 782
Last 80 unconsumed characters:

, :backtrace=>["uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/treeparser.rb:28:in `parse'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:288:in `build'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:45:in `initialize'", "D:/logstash-7.4.1/vendor/bundle/jruby/2.5.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:971:in `parse'", "D:/logstash-7.4.1/vendor/bundle/jruby/2.5.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:164:in `xml_in'", "D:/logstash-7.4.1/vendor/bundle/jruby/2.5.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:203:in `xml_in'", "D:/logstash-7.4.1/vendor/bundle/jruby/2.5.0/gems/logstash-filter-xml-4.0.7/lib/logstash/filters/xml.rb:185:in `filter'", "D:/logstash-7.4.1/logstash-core/lib/logstash/filters/base.rb:143:in `do_filter'", "D:/logstash-7.4.1/logstash-core/lib/logstash/filters/base.rb:162:in `block in multi_filter'", "org/jruby/RubyArray.java:1800:in `each'", "D:/logstash-7.4.1/logstash-core/lib/logstash/filters/base.rb:159:in `multi_filter'", "org/logstash/config/ir/compiler/AbstractFilterDelegatorExt.java:115:in `multi_filter'", "D:/logstash-7.4.1/logstash-core/lib/logstash/java_pipeline.rb:243:in `block in start_workers'"]}
D:/logstash-7.4.1/vendor/bundle/jruby/2.5.0/gems/awesome_print-1.7.0/lib/awesome_print/formatters/base_formatter.rb:31: warning: constant ::Fixnum is deprecated

These have to come before the xml filter because the are transforming the message into valid xml.

Thanks, we no longer get XML errors. However the console display is not quite what we were expecting. Our XML contains 2 records, one for id 34 and one for id 35. When we initially run logstash, it displays all the XML values for id 34, but not for id 35. When I hit Ctrl-C to abort the script, it then displayed the XML values for id 35. Is this expected behavior? Is there a way to get the console display to immediately dispaly the XML values for both rows?

Use the auto_flush_interval on the multiline codec.

Thanks. Console output is now as expected.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.