How to parse xml from a single line

gonzalo · May 15, 2018, 3:36pm

Hello,

Before starting just thank you all.

The fact is that I am trying to parse an XML through the XML filter from logstash but I am getting blocked.

The XML format is as follows:

<?xml version="1.0" encoding="UTF-8"?>

where in the first line we find have: <?xml.....> and in another line (just one) the rest of data. Keep in mind that the there is more than one ... nodes even there is just one in the example above.

What I have until now in my logstash.conf is:

input {
file{
path => "C:\dummy.xml"
start_position => "beginning"
sincedb_path => "/dev/null"
type => "xml"
}
}

filter {
if [type] == "xml" {
xml {
remove_namespaces => true
source => "message"
store_xml => false
target => "xml_content"
xpath => ["/Logger/logEntry/@event", "event"]
}

	mutate{
		add_field => ["dummy_event", "%{event}"]
	}
}

}

output {
stdout{}
}

but I can only get the first line of the XML from logstash output:

2018-05-15T15:07:02.116Z XXXXXXXXX <?xml version="1.0" encoding="UTF-8"?> (where XXXXXXXXX is the PC name)

Thank you again.

Badger · May 15, 2018, 3:51pm

You are asserting that the file input fails to read the second line of a two line file? Does the second line end with a newline character?

motts · May 15, 2018, 8:22pm

I may be mistaken, but my experience with the "target" options is that data will not go into a field that is not already defined or created. I would try adding the "xml_content" field before designating the data to go there.

@magnusbaeck stated something similar in the article here:

Badger · May 15, 2018, 9:02pm

Actually setting target makes no sense when store_xml => false is set.

gonzalo · May 16, 2018, 11:14am

Hello,

Thank you a lot for replying.

I have been working a little bit more on it and I have been able to get the whole XML as an event after a previous formatting of it in several lines.

2018-05-16T08:07:40.335Z XXXXXXXXXXX <?xml version="1.0" encoding="UTF-8"?>
        <Logger xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" fileFormatVersion="1" recordID="2013.11.04_08.22.07" xsi:type="Logger">
                <logEntry index="1" kms="dummy" mode="dummy" nameUser1="dummy" nameUser2="" event="dummy">
                        <entryMAC>XXXXXXXXXXXXXXXX</entryMAC>
                        <logTimestamp second="7" minute="22" hour="8" day="4" month="7" year="2015"/>
                </logEntry>
                <logEntry index="2" kms="dummy" mode="dummy" nameUser1="dummy" nameUser2="dummy" event="dummy_event">
                        <entryMAC>XXXXXXXXXXXXXXXX</entryMAC>
                        <logTimestamp second="2" minute="24" hour="8" day="4" month="9" year="2015"/>
                </logEntry>
        </Logger>

The logstash configuration I am using is the following:

  input {
	file{
		path => "C:\dummy.xml"
		start_position => "beginning"
		sincedb_path => "/dev/null"
		type => "xml"		
		codec => multiline {
			pattern => "^<\?xml .*\?>"
			negate => "true"
			what => previous
			auto_flush_interval => 1
		}
	}
}

filter {
	if [type] == "xml" {
		xml {
			namespaces => {
				"xsi" => "http://www.w3.org/2001/XMLSchema-instance"
			}
			store_xml => true
			source => "message"
			target => "parsed"
		}
		
		split{
			field => "[parsed][xml][Logger]"
			add_field => {
				dummy_event => "%{[parsed][xml][Logger][logEntry]}"
			}
		
		}
		
		mutate{
			add_field => ["dummy_event_field", "%{dummy_event}"]
		}
	}
}

output {
	stdout{}
}

Do you know any way to divide each logEntry into a different event? Maybe using the Split filter?

I have tried with the last one without sucess. I have just obtained the correct parsing but within a single event and inside ".logEntry" field in kibana:

{
  "nameUser1": "dummy",
  "mode": "dummy",
  "event": "dummy",
  "nameUser2": "",
  "index": "1",
  "entryMAC": [
    "XXXXXXXXXXXXXXXX"
  ],
  "logTimestamp": [
    {
      "day": "4",
      "month": "7",
      "minute": "22",
      "hour": "8",
      "second": "7",
      "year": "2015"
    }
  ],
  "kms": "dummy"
},
{
  "nameUser1": "dummy",
  "mode": "dummy",
  "event": "dummy_event"",
  "nameUser2": "dummy",
  "index": "2",
  "entryMAC": [
    "XXXXXXXXXXXXXXXX"
  ],
  "logTimestamp": [
    {
      "day": "4",
      "month": "9",
      "minute": "24",
      "hour": "8",
      "second": "2",
      "year": "2015"
    }
  ],
  "kms": "dummy"
}

Thank you again.

Badger · May 16, 2018, 2:26pm

Yes, use split.

split{ field => "[parsed][logEntry]" }

gonzalo · May 24, 2018, 1:01pm

Hello,

Thank you a lot for the support. The split filter finally worked and now I am getting each separated events.

I would like to share with you another blocking point I am facing. The fact is that when de XML file is bigger tan 500 lines and with the same format as previous examples, logstash is throwing something that seems to be an error.

ion=>#<REXML::ParseException: No close tag for /Logger/logEntry[125]
Line: 501
Position: 51650
Last 80 unconsumed characters:
>, :backtrace=>["uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/treeparser.rb:28:in `parse'", "uri:classloader:/META-INF/jruby.hom
e/lib/ruby/stdlib/rexml/document.rb:288:in `build'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:45:in `initialize'", "C:/
SOFT/logstash-6.2.3/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:971:in `parse'", "C:/SOFT/logstash-6.2.3/vendor/bundle/jruby/2.3.
0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:164:in `xml_in'", "C:/SOFT/logstash-6.2.3/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:20
3:in `xml_in'", "C:/SOFT/logstash-6.2.3/vendor/bundle/jruby/2.3.0/gems/logstash-filter-xml-4.0.5/lib/logstash/filters/xml.rb:187:in `filter'", "C:/SOF
T/logstash-6.2.3/logstash-core/lib/logstash/filters/base.rb:145:in `do_filter'", "C:/SOFT/logstash-6.2.3/logstash-core/lib/logstash/filters/base.rb:16
4:in `block in multi_filter'", "org/jruby/RubyArray.java:1734:in `each'", "C:/SOFT/logstash-6.2.3/logstash-core/lib/logstash/filters/base.rb:161:in `m
ulti_filter'", "C:/SOFT/logstash-6.2.3/logstash-core/lib/logstash/filter_delegator.rb:47:in `multi_filter'", "(eval):118:in `block in initialize'", "o
rg/jruby/RubyArray.java:1734:in `each'", "(eval):115:in `block in initialize'", "(eval):102:in `block in filter_func'", "C:/SOFT/logstash-6.2.3/logsta
sh-core/lib/logstash/pipeline.rb:447:in `filter_batch'", "C:/SOFT/logstash-6.2.3/logstash-core/lib/logstash/pipeline.rb:426:in `worker_loop'", "C:/SOF
T/logstash-6.2.3/logstash-core/lib/logstash/pipeline.rb:385:in `block in start_workers'"]}
[2018-05-24T14:03:29,491][WARN ][logstash.filters.split   ] Only String and Array types are splittable. field:[parsed][logEntry] is of type = NilClass

[2018-05-24T14:03:29,991][WARN ][logstash.filters.xml     ] Error parsing xml with XmlSimple {:source=>"message", :value=>"\t</logEntry>\r\n\t<totalMA
C>5B075DE7858BF4BE</totalMAC>\r\n</Logger>\r", :exception=>#<REXML::ParseException: Missing end tag for '' (got "logEntry")
Line: 1
Position: 14
Last 80 unconsumed characters:
>, :backtrace=>["uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/baseparser.rb:341:in `pull_event'", "uri:classloader:/META-INF/jru
by.home/lib/ruby/stdlib/rexml/parsers/baseparser.rb:185:in `pull'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/treeparser.rb:
23:in `parse'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:288:in `build'", "uri:classloader:/META-INF/jruby.home/lib/rub
y/stdlib/rexml/document.rb:45:in `initialize'", "C:/SOFT/logstash-6.2.3/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:971:in `parse
'", "C:/SOFT/logstash-6.2.3/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:164:in `xml_in'", "C:/SOFT/logstash-6.2.3/vendor/bundle/j
ruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:203:in `xml_in'", "C:/SOFT/logstash-6.2.3/vendor/bundle/jruby/2.3.0/gems/logstash-filter-xml-4.0.5/l
ib/logstash/filters/xml.rb:187:in `filter'", "C:/SOFT/logstash-6.2.3/logstash-core/lib/logstash/filters/base.rb:145:in `do_filter'", "C:/SOFT/logstash
-6.2.3/logstash-core/lib/logstash/filters/base.rb:164:in `block in multi_filter'", "org/jruby/RubyArray.java:1734:in `each'", "C:/SOFT/logstash-6.2.3/
logstash-core/lib/logstash/filters/base.rb:161:in `multi_filter'", "C:/SOFT/logstash-6.2.3/logstash-core/lib/logstash/filter_delegator.rb:47:in `multi
_filter'", "(eval):118:in `block in initialize'", "org/jruby/RubyArray.java:1734:in `each'", "(eval):115:in `block in initialize'", "(eval):102:in `bl
ock in filter_func'", "C:/SOFT/logstash-6.2.3/logstash-core/lib/logstash/pipeline.rb:447:in `filter_batch'", "C:/SOFT/logstash-6.2.3/logstash-core/lib
/logstash/pipeline.rb:426:in `worker_loop'", "C:/SOFT/logstash-6.2.3/logstash-core/lib/logstash/pipeline.rb:385:in `block in start_workers'"]}
[2018-05-24T14:03:30,007][WARN ][logstash.filters.split   ] Only String and Array types are splittable. field:[parsed][logEntry] is of type = NilClass

I think that it is related with the maximum number of events that logstash can manage before applying filters. By default, it can manage 125 which is based on the value set in the variable: pipeline.batch.size in logstash.yml. In addition, each entry in the XML occupies 4 lines which used as divisor of the whole file is 500/4 = 125.

However, I have included the following line in logstash.yml without results:

pipeline.batch.size = 1000.

Do you have any idea about what is happening?

Thank you again.

system · June 21, 2018, 1:01pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Using logstash Logstash	8	1792	July 6, 2017
Can i use file filter for xml docs Logstash	12	2876	July 6, 2017
Parse basic xml file Logstash	12	774	June 18, 2022
XML Confusion Logstash	2	486	April 12, 2017
Logstash - Parse txt with xml Logstash	5	249	June 27, 2022

How to parse xml from a single line

Related topics