I posted a topic earlier about this but it got bogged down with craziness, hoping that is the reason there was limited assistance from the community.
I am trying to ingest DMARC aggregate XML reports. Here is my pipeline:
input {
file {
path => "C:/DMARC/*.xml"
discover_interval => 5
}
}
filter {
xml {
target => "doc"
source => "message"
force_array => false
remove_namespaces => true
}
}
output {
elasticsearch {
hosts => "ElasticSearch:9200"
user => "elastic"
password => "elastic"
http_compression => true
manage_template => false
index => "dmarcxml-%{+YYYY.MM.dd}"
}
}
Here's a sample of the data that is being ingested:
<?xml version="1.0" encoding="windows-1252"?><feedback xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns:ns1='http://dmarc.org/dmarc-xml/0.1' xsi:schemaLocation='http://dmarc.org/dmarc-xml/0.1 dmarc_agg_report.xsd'><report_metadata><org_name>AOL</org_name><email>postmaster@aol.com</email><report_id>example.com_1517011200</report_id><date_range><begin>1516924800</begin><end>1517011200</end></date_range></report_metadata>
<policy_published><domain>example.com</domain><adkim>r</adkim><aspf>r</aspf><p>none</p><sp>none</sp><pct>100</pct></policy_published>
<record><row><source_ip>192.168.1.1</source_ip><count>1</count><policy_evaluated><disposition>none</disposition><spf>fail</spf></policy_evaluated></row><identifiers><header_from>example.com</header_from></identifiers><auth_results><dkim><domain>not.evaluated</domain><result>none</result></dkim><spf><domain>example.com</domain><scope>mfrom</scope><result>permerror</result></spf></auth_results></record>
<record><row><source_ip>192.168.1.1</source_ip><count>1</count><policy_evaluated><disposition>none</disposition><spf>fail</spf></policy_evaluated></row><identifiers><header_from>example.com</header_from></identifiers><auth_results><dkim><domain>not.evaluated</domain><result>none</result></dkim><spf><domain>example.com</domain><scope>mfrom</scope><result>permerror</result></spf></auth_results></record>
<record><row><source_ip>204.232.172.40</source_ip><count>1</count><policy_evaluated><disposition>none</disposition><spf>fail</spf></policy_evaluated></row><identifiers><header_from>example.com</header_from></identifiers><auth_results><dkim><domain>not.evaluated</domain><result>none</result></dkim><spf><domain>example.com</domain><scope>mfrom</scope><result>permerror</result></spf></auth_results></record>
<record><row><source_ip>192.168.1.2</source_ip><count>2</count><policy_evaluated><disposition>none</disposition><spf>fail</spf></policy_evaluated></row><identifiers><header_from>example.com</header_from></identifiers><auth_results><dkim><domain>not.evaluated</domain><result>none</result></dkim><spf><domain>example.com</domain><scope>mfrom</scope><result>permerror</result></spf></auth_results></record>
Problem 1:
The filter doesn't know how to handle the feedback ?attribute? in the namespace because it doesn't have a closing tag and it's ignoring some of the xml. Here is the error I see when attempting to ingest a DMARC aggregate report:
[2018-02-06T23:02:31,358][WARN ][logstash.filters.xml ] Error parsing xml with XmlSimple :source=>"message", :value=>"<?xml version="1.0" encoding="windows-1252"?<report_metadata><org_name>AOL</org_name>postmaster@aol.com<report_id>example.com_1517011200</report_id><date_range>1516924800151701 1200</date_range></report_metadata>", :exception=>#<REXML::ParseException: No close tag for /feedback Line: 1 Position: 438 Last 80 unconsumed characters: >, :backtrace=>"C:/Logstash/vendor/jruby/lib/ruby/stdlib/rexml/parsers/treeparser.rb:28:in parse'", "C:/Logstash/vendor/jruby/lib/ruby/stdlib/rexml/document.rb:288:in build'", "C:/Logstash/vendor/jruby/lib/ruby/stdlib/rexml/document.rb:45:in initialize'", "C:/Logstash/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:971:in
parse'", "C:/Logstash/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:164:in xml_in'", "C:/Logstash/vendor/bundle/jruby/2.3.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:203:in
xml_in'", "C:/Logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-xml-4.0.5/lib/logstash/filters/xml.rb:187:in filter'", "C:/Logstash/logstash-core/lib/logstash/filters/base.rb:145:in
do_filter'", "C:/Logstash/logstash-core/lib/logstash/filters/base.rb:164:in block in multi_filter'", "org/jruby/RubyArray.java:1734:in
each'", "C:/Logstash/logstash-core/lib/logstash/filters/base.rb:161:in multi_filter'", "C:/Logstash/logstash-core/lib/logstash/filter_delegator.rb:48:in
multi_filter'", "(eval):42:in block in filter_func'", "C:/Logstash/logstash-core/lib/logstash/pipeline.rb:455:in filter_batch'", "C:/Logstash/logstash-core/lib/logstash/pipeline.rb:434:in 'worker_loop'", "C:/Logstash/logstash-core/lib/logstash/pipeline.rb:393:in
block in start_workers'"]