How to capture XML files in email attachment using IMAP - for DMARC Reporting


Is there a way to connect to an email account, download all emails, retrieve all the attachment files that are XML, then parse the XML and have the XML data populate the fields and discard the attachment?

I tried using the imap plugin on the input, but I don't see the attachment anywhere on the log that comes out on the output, even if the attachment is a simple text file with one character.

I've seen articles saying elasticsearch has an "attachment" type that can be configured, so I imagine logstash has a way to retrieve and send attachments to elasticsearch. But this is not my end goal, I want to retrieve the XML file and parse it with the filter and only output the XML fields individually.
I also know that there is a XML plugin for the filter, so I know parsing the XML should be possible also once I have the file.

I need this to work with DMARC reports. DMARC reports are thousands of emails that are received with an XML attachment.

Thank you,


"I figured out how to extract the XML data from the email, now I'm looking into parsing the XML with the XML filter.
I've tried removing the initial header <?xml version=\"1.0\" encoding=\"UTF-8\" ?>, but still can't parse the XML as fields and values.

Here is my config file filter:

filter {
match => { "message" => "(?.?>)(?(.|\r|\n))" }
xml {
source => "xmldata"
store_xml => true
target => "dmarc"

force_array => false


Here is my output on stdout:
"discard" => "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>",
"from" => """",
"xmldata" => "\n\n 1.0\n <report_metadata>\n <org_name>chrobinson.c
om</org_name>\n\n <extra_contact_info>
.com</extra_contact_info>\n <report_id>8d0b16$</report_id>\n <date_range>\n
1540184463\n 1540270863\n </date_range>\n </report_metadata>\n <policy_publish
ed>\n\n r\n r\n


\n \n <
pct>100\n </policy_published>\n \n \n <source_ip></source_ip>\n 1</
count>\n <policy_evaluated>\n none\n fail\n pass<
/spf>\n </policy_evaluated>\n \n \n <header_from></header_from>\n <
envelope_from></envelope_from>\n \n <auth_results>\n \n ameriq\n mfrom\n pass\n \n </auth_results>\n </re
cord>\n \n \n <source_ip></source_ip>\n 1\n <policy_evaluated

\n none\n fail\n pass\n </policy_evaluated
\n \n \n <header_from></header_from>\n <envelope_from></en
velope_from>\n \n <auth_results>\n \n\n mfrom\n pass\n \n </auth_results>\n \n \n \n
<source_ip></source_ip>\n 1\n <policy_evaluated>\n none</d
isposition>\n fail\n pass\n </policy_evaluated>\n \n <identifiers
\n <header_from></header_from>\n <envelope_from></envelope_from>\n </identifiers
\n <auth_results>\n \n\n mfrom\n pass\n \n </auth_results>\n \n\n\n\n\n\n\n\n",
"references" => "",
"in-reply-to" => "",
"message-id" => "",
"return-path" => "",
"content-type" => "multipart/mixed; boundary=------------987CC2FFF12A20F12E0D79DA",
"date" => "Wed, 14 Nov 2018 12:02:40 -0500",
"x-forwarded-message-id" => "",
"@timestamp" => 2018-11-14T17:02:40.000Z,
"message" => "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n\n 1.0\n \n <org_name></org_name>\n\n <
extra_contact_info></extra_contact_info>\n <report_id>8d0b16$5fda2f7=ae0f234e44b15731@chr</report_id>\n <date_range>\n 1540184463\n 1540270863\n </date_rang
e>\n </report_metadata>\n <policy_published>\n\n r\n r</a


\n \n 100\n </policy_published>\n \n \n <source_ip</source_ip>\n 1\n <policy_evaluated>\n none\n
fail\n pass\n </policy_evaluated>\n \n \n</header_from>\n <envelope_from></envelope_from>\n \n <auth_re
sults>\n \n\n mfrom\n pass
\n \n </auth_results>\n \n \n \n <source_ip></source_ip>\n
1\n <policy_evaluated>\n none\n fail\n
pass\n </policy_evaluated>\n \n \n <header_from></header_fr
om>\n <envelope_from></envelope_from>\n \n <auth_results>\n \n\n mfrom\n pass\n \n </auth_resu
lts>\n \n \n \n <source_ip></source_ip>\n 1\n \n none\n fail\n pass\n </poli
cy_evaluated>\n \n \n <header_from></header_from>\n <envelope_from>ameri</envelope_from>\n \n <auth_results>\n \n\n
mfrom\n pass\n \n </auth_results>\n \n
"to" => "",
"mime-version" => "1.0",
"user-agent" => "Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Thunderbird/60.3.0",
"@version" => "1",
"received" => "from [] ( []) by WIN-M4T2DLP9B1I with ESMT
PA ; Wed, 14 Nov 2018 12:02:40 -0500",
"subject" => "1103",
"content-language" => "en-US"

Can anybody help me figure out why XML is not being parsed?"

(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.