Difficulties to parse/filter on xml file



I'm trying to parse a xml file (Export of Arbor Alerts. example here http://pastebin.com/jpQWMbjz)
but it currently fails. Either Most lines are going unparsed into kibana, either nothing get inserted in elasticsearch

Trouble parsing xml with XmlSimple {:source=>"message", :value=>"</peakflow>", :exception=>#<REXML::ParseException: #<REXML::ParseException: Missing end tag for '' (got "peakflow")

Line: 1
Position: 11

As the source file is validate xmlint, I', thinking it's more the multiline rule which is not getting extract correctly (but it shouldn't matter on xml source...)

Manual call

cat /vagrant/example-logs/Arbor-Alerts.xml | /opt/logstash/bin/logstash -f /vagrant/confs/logstash/logstash.conf

You can find the vagrant box and the configuration I use here

Also tried with this ELK configuration to check if it was my setup

but same results

Any hints?


(Joshua Rich) #2

I believe the xml filter expects the XML to appear more stream-like and less human-readable if that makes sense :smile: What happens if you avoid your xmlint call and ensure the XML input does not have extra whitespace in it (including newlines)?


I tried "cleaning" file with

$ tidy -xml -q file1.xml | tr -d '\n' > file2.xml

the output is one line without indent or extra new line

I resubmit file with and without disabling multiline but got nothing import and no debug output either (with either output: debug or elasticsearch)

(system) #4