DMARC XML ingest

We've recently setup an Elastic Stack and are looking into potential uses for our organization (for which we've identified quite a few). One capability that it seems ELK would be fantastic for but I haven't been able to find, is the ability to ingest and present data on DMARC aggregate reports that are XML formatted. These XML files are generated by a receiving mail server and sent to sender domains. They're formatted in the following way:

Schema Statement - Single line
Published Policy - Single line
Comparison of received email against published policy - One line for each IP receiving mail server received an email from

Has anybody tackled this yet? Does anybody want to? Any good resources that could teach me how to do this?

We’ve renamed ELK to the Elastic Stack, otherwise Beats and APM feel left out! :wink: Check out https://www.elastic.co/elk-stack

Have you looked at https://www.elastic.co/guide/en/logstash/6.1/plugins-filters-xml.html?

I'm a novice when it comes to what the Elastic Stack is and how it works. The documentation I've read leaves me with significantly more questions than answers. My organization does not have money to invest in training for a product that we are feeling out nor do I have the money to do it personally. This is why I asked if it had been done yet or if there were resources to teach me how to do it.

I appreciate the response but snark is non-productive.

Snark?

The documentation that is out there overwhelmingly refers to this as the ELK Stack. Your first suggestion to a new/potential customer POC'ing your product is that they should call it by it's new name and point them to a marketing time sink page? That's pretty frustrating and really sets the tone.

Secondly, you're asking me to refer to documentation. I may have jumped the gun (due in part to the preceding useless information) and I understand that some people fail to do so, but it's still annoying to see that as the sole recommendation.

These DMARC reports are not formatted in a manner that most XML files are, or at least the ones I've seen. There is some relational information here between the Published Policy line and subsequent Record lines.

  • How would Logstash or Elastic handle this?
  • Could it handle it properly?
  • Does the Schema line hold any value in retaining?
  • Does the Published policy line hold any value in retaining as this is a relatively static entry?

These are Questions I can't begin to answer. Hence the questions I asked in more original post:

  • Has anybody tackled this?
  • Does anybody want to? (Or more appropriately, is it possible to do so)
  • Are there any good teaching resources, more specifically, ones that are not behind a paywall?

I'll admit that I inappropriately over-reacted, but your response was not helpful.

You are right in that there is a lot of resources out there that refer to it as the ELK Stack. And you are right in that it sets the tone. But it's not intended to be snarky, it's so that we are all talking the same language.
I'm sorry you feel like it is a waste of time, but it was not my intention.

You know the value you want to extract from the data, but not how to.
We know the tools that can extract the data, but not what the structure is or what value means to you.

We're happy to help bridge that inherent gap and pointing out resources, but you need to help us help you by also directing us accordingly.
If the provided answers are not what you want, that's 100% fine! Just clarify why not (like you have, so thank you for the extra info!) and we can continue the discussion to more fruitful points to get you sorted out :slight_smile:

Moving on;

Elasticsearch is non relational. But if you can expand on what the relationship implies maybe there is an alternative that can be extracted.

You know the format and the result you want to achieve and you're going to be the best judge of the value of that data. My gut feeling is if you are asking if there's value, then there's likely not, and you are on the right track by discarding it :+1:

OK, finally figured out how to get centralized pipeline management configured and this is the pipeline I am using to ingest XML:

input {
file {
path => "C:/DMARC/*.xml"
discover_interval => 5
}
}
filter {
xml {
target => "doc"
source => "message"
}
}
output {
elasticsearch {
hosts => "Elasticsearch:9200"
user => "elastic"
password => "elastic"
http_compression => true
manage_template => false
index => "dmarcxml-%{+YYYY.MM.dd}"
}
}

The next hurdle is figuring out how to create fields. I tried reading the grok debugger but I couldn't make heads or tails of what it was saying. How do I associate values inside an XML tag with a field name? Below is what a record looks like when ingested? If someone could give me a simple one or two lines I think I can "reverse engineer" it to figure out what has to be done.

If it matters any, these are the mappings I am thinking of using for each of the XML tags.
source_ip: Sender IP
count: Message Count
disposition: DMARC Action
policy_spf: SPF Result
header_from: Senders Header
dkim.domain: DKIM Domain
dkim.result: DKIM Result
spf.domain: SPF Domain
spf.scope: SPF Scope
spf.result SPF_Result

Are you trying to break out the details in fields like doc.auth_results?

Yes, doc.auth_results, doc.identifiers, doc.row all have information that I want in separate fields.

I guess some of the raw data going into logstash would be helpful:

<record><row><source_ip>204.232.172.40</source_ip><count>1</count><policy_evaluated><disposition>none</disposition><spf>fail</spf></policy_evaluated></row><identifiers><header_from>test.com</header_from></identifiers><auth_results><dkim><domain>not.evaluated</domain><result>none</result></dkim><spf><domain>test.com</domain><scope>mfrom</scope><result>permerror</result></spf></auth_results></record>

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.