XML Input Issues

We have been asked look into it. We are a monitoring team, Zabbix, Lansweeper, Splunk and evaluating ELK, but had never heard of DMARC until this request. I think the data collection has just been turned on for a few days.

Thanks

DMARC/SPF/DKIM are all email protection measures. The Federal government, or maybe it was just the DOD can't recall at the moment, put out a mandate to become DMARC compliant. DMARC tells remote organizations how to handle mail it receives from you in regards to SPF/DKIM alignment and aggregate reporting and forensic reporting. The aggregate reports are sent from remote MTAs to you, if you have it turned on via a DNS TXT record.

We are a relatively small organization in terms of email volume but I am seeing us send out about 50k messages a day. DMARC reports are showing another 50-75k messages a day are coming from IPs other than our own. It would greatly benefit our reputation to enable SPF/DKIM to knock that number down significantly. I also know that we are not alone regarding DMARC implementation and am working to create a solid method to analyze these things without paying hundreds or thousands to service providers.

I worked on this some today, mostly self education, but some testing. I noticed that with a small test file, I didn't get the last record. Should the pattern be

pattern => "<record>|<\/feedback>"

To kick in the last record?

In the multiline codec, do you have auto_flush_interval configured? That's what solved that behavior for me.

This is about 99% of the complete product. I ran into some issues when I implemented it at work regarding the indexing of the report.end field. It's fixed on my work config which I don't have access to at this very moment. This should get you close though.

If you are running on Windows, or have a Windows box to work from, I have a script that is just about complete that extracts the XML from the compressed location, reorders the XML to include policy and metadata in each record and then saves it. The script works fine just doesn't have proper error handling which I am working on...should have done before the end of the weekend.

input {
  file {
    id => "C:\DMARC\*.xml"
    path => "C:/DMARC/*.xml"
    discover_interval => 5
    codec => multiline {
      auto_flush_interval => 5
      negate => true
      pattern => "<record>"
      what => "previous"
    }
  }
}
filter {
  xml {
    id => "Field Extraction"
    store_xml => false
    source => "message"
    xpath => [
      "record/report_metadata/org_name/text()", "report.org",
      "record/report_metadata/email/text()", "report.org_contact",
      "record/report_metadata/extra_contact_info/text()", "report.additional_contact",
      "record/report_metadata/report_id/text()", "report.id",
      "record/report_metadata/date_range/begin/text()", "report.start",
      "record/report_metadata/date_range/end/text()", "report.end",
      "record/policy_published/domain/text()", "policy.domain",
      "record/policy_published/aspf/text()", "policy.spf_mode",
      "record/policy_published/adkim/text()", "policy.dkim_mode",
      "record/policy_published/p/text()", "policy.dmarc.domain_action",
      "record/policy_published/sp/text()", "policy.dmarc.subdomain_action",
      "record/policy_published/pct/text()", "policy.percentage",
      "record/row/source_ip/text()", "email.source_ip",
      "record/row/count/text()", "email.count",
      "record/row/policy_evaluated/disposition/text()", "email.dmarc_action",
      "record/row/policy_evaluated/spf/text()", "email.spf_evaluation",
      "record/row/policy_evaluated/dkim/text()", "email.dkim_evaluation",
      "record/row/policy_evaluated/reason/type/text()", "dmarc.override_type",
      "record/row/policy_evaluated/reason/comment/text()", "dmarc.override_comment",
      "record/identifiers/envelope_to/text()", "email.envelope_to",
      "record/identifiers/envelope_from/text()", "email.envelope_from",
      "record/identifiers/header_from/text()", "email.header_from",
      "record/auth_results/dkim/domain/text()", "authresult.dkim_domain",
      "record/auth_results/dkim/result/text()", "authresult.dkim_result",
      "record/auth_results/spf/domain/text()", "authresult.spf_domain",
      "record/auth_results/spf/scope/text()", "authresult.spf_scope",
      "record/auth_results/spf/result/text()", "authresult.spf_result"
    ]
  }
    geoip {
      id => "IP Geo-Mapping"
      source => "email.source_ip"
      add_field => {
        "[geoip][location][coordinates]" => "%{[geoip][location][lat]}, %{[geoip][location][lon]}"
      }
      remove_field => ["@version", "_score", "_type", "host"]
    }
  if "_geoip_lookup_failure" in [tags] {
    drop { }
  }
}
output {
  elasticsearch {
    id => "Send to Elasticsearch"
    hosts => ["Elastic2012:9200"]
#    user => "elastic"
#    password => "elastic"
    http_compression => true
    template => "C:/logstash/templates/dmarcxmltemplate.json"
    template_name => "dmarcxml"
    index => "dmarcxml-%{+YYYY.MM.DD}"
  }
}

Also, to get this to work as intended, you'll need to the elasticsearch template that's referenced in the output of the pipeline. Just save the below to a file and modify the template setting in the output section. You may also want to adjust the number of shards or replicas, depending on your environment. You can see those settings near the top.

https://pastebin.com/H9LsAp8a

Ah, no I didn't have that. This is my first use of the multiline codec.

I think these files are created complete, they aren't the common appended logs, so I hadn't considered this.

Well, I had to heavily comment out a bunch of commands in my PowerShell script. I feed it 880 archives containing the reports and for some reason it wouldn't decompress all of them. Of the 880, it only decompressed 230. I narrowed it down to a set of archives but there was no commonality between them. Some of them gzip, others zip format. No common sender either. The only thing I can think of is maybe a common MTA application that's doing the compression is some weird encoding or something. Regardless, it adds another manual step to the process, which is unfortunate.

Sometime this week I will get around to finally dumping it all into an "official" GitHub project for others to get ahold of and tear apart.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.