How to parse xml from a single line


#1

Hello,

Before starting just thank you all.

The fact is that I am trying to parse an XML through the XML filter from logstash but I am getting blocked.

The XML format is as follows:

<?xml version="1.0" encoding="UTF-8"?>

where in the first line we find have: <?xml.....> and in another line (just one) the rest of data. Keep in mind that the there is more than one ... nodes even there is just one in the example above.

What I have until now in my logstash.conf is:

input {
file{
path => "C:\dummy.xml"
start_position => "beginning"
sincedb_path => "/dev/null"
type => "xml"
}
}

filter {
if [type] == "xml" {
xml {
remove_namespaces => true
source => "message"
store_xml => false
target => "xml_content"
xpath => ["/Logger/logEntry/@event", "event"]
}

	mutate{
		add_field => ["dummy_event", "%{event}"]
	}
}

}

output {
stdout{}
}

but I can only get the first line of the XML from logstash output:

2018-05-15T15:07:02.116Z XXXXXXXXX <?xml version="1.0" encoding="UTF-8"?> (where XXXXXXXXX is the PC name)

Thank you again.


#2

You are asserting that the file input fails to read the second line of a two line file? Does the second line end with a newline character?


#3

I may be mistaken, but my experience with the "target" options is that data will not go into a field that is not already defined or created. I would try adding the "xml_content" field before designating the data to go there.

@magnusbaeck stated something similar in the article here:


#4

Actually setting target makes no sense when store_xml => false is set.


#5

Hello,

Thank you a lot for replying.

I have been working a little bit more on it and I have been able to get the whole XML as an event after a previous formatting of it in several lines.

2018-05-16T08:07:40.335Z XXXXXXXXXXX <?xml version="1.0" encoding="UTF-8"?>
        <Logger xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" fileFormatVersion="1" recordID="2013.11.04_08.22.07" xsi:type="Logger">
                <logEntry index="1" kms="dummy" mode="dummy" nameUser1="dummy" nameUser2="" event="dummy">
                        <entryMAC>XXXXXXXXXXXXXXXX</entryMAC>
                        <logTimestamp second="7" minute="22" hour="8" day="4" month="7" year="2015"/>
                </logEntry>
                <logEntry index="2" kms="dummy" mode="dummy" nameUser1="dummy" nameUser2="dummy" event="dummy_event">
                        <entryMAC>XXXXXXXXXXXXXXXX</entryMAC>
                        <logTimestamp second="2" minute="24" hour="8" day="4" month="9" year="2015"/>
                </logEntry>
        </Logger>

The logstash configuration I am using is the following:

  input {
	file{
		path => "C:\dummy.xml"
		start_position => "beginning"
		sincedb_path => "/dev/null"
		type => "xml"		
		codec => multiline {
			pattern => "^<\?xml .*\?>"
			negate => "true"
			what => previous
			auto_flush_interval => 1
		}
	}
}

filter {
	if [type] == "xml" {
		xml {
			namespaces => {
				"xsi" => "http://www.w3.org/2001/XMLSchema-instance"
			}
			store_xml => true
			source => "message"
			target => "parsed"
		}
		
		split{
			field => "[parsed][xml][Logger]"
			add_field => {
				dummy_event => "%{[parsed][xml][Logger][logEntry]}"
			}
		
		}
		
		mutate{
			add_field => ["dummy_event_field", "%{dummy_event}"]
		}
	}
}

output {
	stdout{}
}

Do you know any way to divide each logEntry into a different event? Maybe using the Split filter?

I have tried with the last one without sucess. I have just obtained the correct parsing but within a single event and inside ".logEntry" field in kibana:

{
  "nameUser1": "dummy",
  "mode": "dummy",
  "event": "dummy",
  "nameUser2": "",
  "index": "1",
  "entryMAC": [
    "XXXXXXXXXXXXXXXX"
  ],
  "logTimestamp": [
    {
      "day": "4",
      "month": "7",
      "minute": "22",
      "hour": "8",
      "second": "7",
      "year": "2015"
    }
  ],
  "kms": "dummy"
},
{
  "nameUser1": "dummy",
  "mode": "dummy",
  "event": "dummy_event"",
  "nameUser2": "dummy",
  "index": "2",
  "entryMAC": [
    "XXXXXXXXXXXXXXXX"
  ],
  "logTimestamp": [
    {
      "day": "4",
      "month": "9",
      "minute": "24",
      "hour": "8",
      "second": "2",
      "year": "2015"
    }
  ],
  "kms": "dummy"
}

Thank you again.


#6

Yes, use split.

split{ field => "[parsed][logEntry]" }

#7

Hello,

Thank you a lot for the support. The split filter finally worked and now I am getting each separated events.

I would like to share with you another blocking point I am facing. The fact is that when de XML file is bigger tan 500 lines and with the same format as previous examples, logstash is throwing something that seems to be an error.

ion=>#<REXML::ParseException: No close tag for /Logger/logEntry[125]
Line: 501
Position: 51650
Last 80 unconsumed characters:
>, :backtrace=>["uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/treeparser.rb:28:in `parse'", "uri:classloader:/META-INF/jruby.hom
e/lib/ruby/stdlib/rexml/document.rb:288:in `build'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:45:in `initialize'", "C:/
SOFT/logstash-6.2.3/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:971:in `parse'", "C:/SOFT/logstash-6.2.3/vendor/bundle/jruby/2.3.
0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:164:in `xml_in'", "C:/SOFT/logstash-6.2.3/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:20
3:in `xml_in'", "C:/SOFT/logstash-6.2.3/vendor/bundle/jruby/2.3.0/gems/logstash-filter-xml-4.0.5/lib/logstash/filters/xml.rb:187:in `filter'", "C:/SOF
T/logstash-6.2.3/logstash-core/lib/logstash/filters/base.rb:145:in `do_filter'", "C:/SOFT/logstash-6.2.3/logstash-core/lib/logstash/filters/base.rb:16
4:in `block in multi_filter'", "org/jruby/RubyArray.java:1734:in `each'", "C:/SOFT/logstash-6.2.3/logstash-core/lib/logstash/filters/base.rb:161:in `m
ulti_filter'", "C:/SOFT/logstash-6.2.3/logstash-core/lib/logstash/filter_delegator.rb:47:in `multi_filter'", "(eval):118:in `block in initialize'", "o
rg/jruby/RubyArray.java:1734:in `each'", "(eval):115:in `block in initialize'", "(eval):102:in `block in filter_func'", "C:/SOFT/logstash-6.2.3/logsta
sh-core/lib/logstash/pipeline.rb:447:in `filter_batch'", "C:/SOFT/logstash-6.2.3/logstash-core/lib/logstash/pipeline.rb:426:in `worker_loop'", "C:/SOF
T/logstash-6.2.3/logstash-core/lib/logstash/pipeline.rb:385:in `block in start_workers'"]}
[2018-05-24T14:03:29,491][WARN ][logstash.filters.split   ] Only String and Array types are splittable. field:[parsed][logEntry] is of type = NilClass

[2018-05-24T14:03:29,991][WARN ][logstash.filters.xml     ] Error parsing xml with XmlSimple {:source=>"message", :value=>"\t</logEntry>\r\n\t<totalMA
C>5B075DE7858BF4BE</totalMAC>\r\n</Logger>\r", :exception=>#<REXML::ParseException: Missing end tag for '' (got "logEntry")
Line: 1
Position: 14
Last 80 unconsumed characters:
>, :backtrace=>["uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/baseparser.rb:341:in `pull_event'", "uri:classloader:/META-INF/jru
by.home/lib/ruby/stdlib/rexml/parsers/baseparser.rb:185:in `pull'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/treeparser.rb:
23:in `parse'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:288:in `build'", "uri:classloader:/META-INF/jruby.home/lib/rub
y/stdlib/rexml/document.rb:45:in `initialize'", "C:/SOFT/logstash-6.2.3/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:971:in `parse
'", "C:/SOFT/logstash-6.2.3/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:164:in `xml_in'", "C:/SOFT/logstash-6.2.3/vendor/bundle/j
ruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:203:in `xml_in'", "C:/SOFT/logstash-6.2.3/vendor/bundle/jruby/2.3.0/gems/logstash-filter-xml-4.0.5/l
ib/logstash/filters/xml.rb:187:in `filter'", "C:/SOFT/logstash-6.2.3/logstash-core/lib/logstash/filters/base.rb:145:in `do_filter'", "C:/SOFT/logstash
-6.2.3/logstash-core/lib/logstash/filters/base.rb:164:in `block in multi_filter'", "org/jruby/RubyArray.java:1734:in `each'", "C:/SOFT/logstash-6.2.3/
logstash-core/lib/logstash/filters/base.rb:161:in `multi_filter'", "C:/SOFT/logstash-6.2.3/logstash-core/lib/logstash/filter_delegator.rb:47:in `multi
_filter'", "(eval):118:in `block in initialize'", "org/jruby/RubyArray.java:1734:in `each'", "(eval):115:in `block in initialize'", "(eval):102:in `bl
ock in filter_func'", "C:/SOFT/logstash-6.2.3/logstash-core/lib/logstash/pipeline.rb:447:in `filter_batch'", "C:/SOFT/logstash-6.2.3/logstash-core/lib
/logstash/pipeline.rb:426:in `worker_loop'", "C:/SOFT/logstash-6.2.3/logstash-core/lib/logstash/pipeline.rb:385:in `block in start_workers'"]}
[2018-05-24T14:03:30,007][WARN ][logstash.filters.split   ] Only String and Array types are splittable. field:[parsed][logEntry] is of type = NilClass

I think that it is related with the maximum number of events that logstash can manage before applying filters. By default, it can manage 125 which is based on the value set in the variable: pipeline.batch.size in logstash.yml. In addition, each entry in the XML occupies 4 lines which used as divisor of the whole file is 500/4 = 125.

However, I have included the following line in logstash.yml without results:

pipeline.batch.size = 1000.

Do you have any idea about what is happening?

Thank you again.


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.