Email grok filter

hello everyone i need i need your help i have a mission which is to make filters for email messages in order to get accurate information like: ip address, subject, date, email, phone, address, etc. using losgtash filter
if anyone has an idea or can help me I will be very grateful

here is an example of an email from which I want to get the informations :

From: IP-Echelon Compliance p2p@copyright.ip-echelon.com
Sent: vendredi, mai 11, 2018 5:11 p.m.
Subject: Notice of Claimed Infringement - Case ID 7c8bcac0b90573e8e121
To: Noc_Isp nocisp.oma@orange.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Notice ID: 7c8bcac0b90573e8e121
Notice Date: 2018-05-11T16:12:27Z

Meditelecom

Dear Sir or Madam:

We are contacting you on behalf of Viacom International Inc. (Viacom). Under penalty of perjury, I assert that IP-Echelon Pty., Ltd., (IP-Echelon) is authorized to act on behalf of the owner of the exclusive copyrights that are alleged to be infringed herein.

IP-Echelon has become aware that the below IP addresses have been using your service for distributing video files, which contain infringing video content that is exclusively owned by Viacom.

IP-Echelon has a good faith belief that the Viacom video content that is described in the below report has not been authorized for sharing or distribution by the copyright owner, its agent, or the law. I also assert that the information contained in this notice is accurate to the best of our knowledge.

We are requesting your immediate assistance in removing and disabling access to the infringing material from your network. We also ask that you ensure the user and/or IP address owner refrains from future use and sharing of Viacom materials and property.

In complying with this notice, Meditelecom should not destroy any evidence, which may be relevant in a lawsuit, relating to the infringement alleged, including all associated electronic documents and data relating to the presence of infringing items on your network, which shall be preserved while disabling public access, irrespective of any document retention or corporate policy to the contrary.

Please note that this letter is not intended as a full statement of the facts; and does not constitute a waiver of any rights to recover damages, incurred by virtue of any unauthorized or infringing activities, occurring on your network. All such rights, as well as claims for other relief, are expressly reserved.

Should you need to contact me, I may be reached at the following address:

Adrian Leatherland
On behalf of IP-Echelon as an agent for Viacom
Address: 7083 Hollywood Blvd., Los Angeles, CA 90028, United States
Email: p2p@copyright.ip-echelon.com

Evidentiary Information:
Protocol: BITTORRENT
Infringed Work: RuPaul's Drag Race
Infringing FileName: RuPaul's Drag Race Season 1
Infringing FileSize: 3181198433
Infringer's IP Address: 196.125.173.100
Infringer's Port: 62141
Initial Infringement Timestamp: 2018-05-11T16:12:26Z

<?xml version="1.0" encoding="UTF-8"?> 7c8bcac0b90573e8e121 Open Normal Viacom Inc. IP-Echelon - Compliance 6715 Hollywood Blvd Los Angeles CA 90028 United States of America +1 (310) 606 2747 p2p@copyright.ip-echelon.com Meditelecom nocisp.oma@orange.com 2018-05-11T16:12:26Z 196.125.173.100 62141 BitTorrent 1 2018-05-11T16:12:26Z RuPaul"s Drag Race RuPaul"s Drag Race Season 1 3181198433 08bc694b3d7d7365d3ac6d6905faa274c792b702 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1

iQEcBAEBAgAGBQJa9cDrAAoJEN5LM3Etqs/WRUYH/RMUmMw1YvAxg+Uwlcxvu138
nB1j9/mEceVrxwsqMGmjfCGSKfqAYUUue51yh6MQ4ppjwkSmTm7Kewrpipzm3Yf6
zPUPj5mmLmKsmlhXwMRM/V7vkSSPvdk14v/IFwXo3vCvnWMUmOj8RkEKA48nLTRL
MxtlPYEMW2afHdY2qOPY/x1PRDWW82ZQi3RgXuSqN9FYmLMQFDUptTh4GjALGgNi
NIiv3bl3qNX1bazv533zUgj34NoWepUA4+URgKoeVEHC4LPotTeawC89hGEDnCoe
U//ZasJLrmDQKcL2GyfDV7GlXCvfaOD1EXJ5QLA/3mZpEHc6/HB5+o+YTb/WNhU=
=MMZV
-----END PGP SIGNATURE-----

Are you looking to parse email in general, or just Echelon messages? iirc, they include a structured version of the message that's much easier to parse as an attachment. Failing that, you can use some (expensive) grok matches, but you'll want to do some pre-processing first.

1 Like

thanks Dave Mrtin ;
yeeah i,m looking fir parse email in general ...I want to analyze all the mails of my mailbox

You might look at parsing your mail logs instead of the messages. Some of what you mentioned is content, but the rest is in the headers.

If that won't work, pull just the headers (or the subset of them that you need) and run them through a grok.

If you don't need to keep the various values synced (you care about the to and from addresses, but not the to-from pairings), you can do that with a grep in your mailbox dir. ( egrep "^To: " * )

1 Like

I think you did not understand me well .. I've already done a few fitlres grok but it has not worked well I want to see examples of filters grok on emails to pull what is inside the eamils.

example :

I want to pull the email, address, subject, record, and all attached xml file fields

From: IP-Echelon Compliance p2p@copyright.ip-echelon.com
Sent: vendredi, mai 11, 2018 1:29 p.m.
Subject: Notice of Claimed Infringement - Case ID 682fbc3494b8400badf3
To: Noc_Isp nocisp.oma@orange.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Notice ID: 682fbc3494b8400badf3
Notice Date: 2018-05-11T12:30:09Z

Meditelecom

Dear Sir or Madam:

We are contacting you on behalf of Paramount Pictures Corporation (Paramount). Under penalty of perjury, I assert that IP-Echelon Pty. Ltd., (IP-Echelon) is authorized to act on behalf of the owner of the exclusive copyrights that are alleged to be infringed herein.

IP-Echelon has become aware that the below IP addresses have been using your service for distributing video files, which contain infringing video content that is exclusively owned by Paramount.

IP-Echelon has a good faith belief that the Paramount video content that is described in the below report has not been authorized for sharing or distribution by the copyright owner, its agent, or the law. I also assert that the information contained in this notice is accurate to the best of our knowledge.

We are requesting your immediate assistance in removing and disabling access to the infringing material from your network. We also ask that you ensure the user and/or IP address owner refrains from future use and sharing of Paramount materials and property.

In complying with this notice, Meditelecom should not destroy any evidence, which may be relevant in a lawsuit, relating to the infringement alleged, including all associated electronic documents and data relating to the presence of infringing items on your network, which shall be preserved while disabling public access, irrespective of any document retention or corporate policy to the contrary.

Please note that this letter is not intended as a full statement of the facts; and does not constitute a waiver of any rights to recover damages, incurred by virtue of any unauthorized or infringing activities, occurring on your network. All such rights, as well as claims for other relief, are expressly reserved.

Should you need to contact me, I may be reached at the following address:

Adrian Leatherland
On behalf of IP-Echelon as an agent for Paramount
Address: 7083 Hollywood Blvd., Los Angeles, CA 90028, United States
Email: p2p@copyright.ip-echelon.com

Evidentiary Information:
Protocol: BITTORRENT
Infringed Work: Deep Impact
Infringing FileName: Deep Impact (1998)
Infringing FileSize: 892604446
Infringer's IP Address: 196.125.218.34
Infringer's Port: 8999
Initial Infringement Timestamp: 2018-05-11T12:29:50Z

<?xml version="1.0" encoding="UTF-8"?> 682fbc3494b8400badf3 Open Normal Paramount Pictures Corporation IP-Echelon - Compliance 6715 Hollywood Blvd Los Angeles CA 90028 United States of America +1 (310) 606 2747 p2p@copyright.ip-echelon.com Meditelecom nocisp.oma@orange.com 2018-05-11T12:29:50Z 196.125.218.34 8999 BitTorrent 1 2018-05-11T12:29:50Z Deep Impact Deep Impact (1998) 892604446 54143cea61813aa2bdc389f70841f7db144951a5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1

For the items in the header, you can use a pattern something like this:

"\nSubject:\s%{GREEDYDATA:[event][subject]}

You can put several of them in the same grok if you set the 'break_on_match" to false. You'll also want to ensure that you ingest the email as a single multi-line document, not individual lines. However, depending on your source format, there may be better tools that could work better (procmail, formail, and maybe even grep.)

For the information in the xml attachment, pull it out either with an outside tool, or a grok ("<?xml version="1.0" encoding="UTF-8"?>%{GREEDYDATA:[event][xmldata]}" maybe. You might have to use the mime tags as delimiters.) and use the xml filter.

1 Like

thank you Dave Mrtin for ur help i will try and i will show u after what i get ..
for the moment I worked on some logs and i get a good result there is the logs :slight_smile:

10.121.123.104 - - [01/Nov/2012:21:01:04 +0100] "GET /cluster HTTP/1.1" 200 1272
10.121.123.104 - - [01/Nov/2012:21:01:17 +0100] "GET /cpc/auth.do?loginsetup=true&targetPage=%2Fcpc%2F HTTP/1.1" 302 466
10.121.123.104 - - [01/Nov/2012:21:01:18 +0100] "GET /cpc?loginsetup=true&targetPage=%252Fcpc%252F HTTP/1.1" 302 -
10.121.123.104 - - [01/Nov/2012:21:01:18 +0100] "GET /cpc/auth.do?loginsetup=true&targetPage=%25252Fcpc%25252F&loginsetup=true HTTP/1.1" 302 494

there is the gonfig file (logstash) :
input {
file {
path => "C:\elk\logstash\file.log"
start_position => "beginning"
sincedb_path => "nul"
}
}
filter {
grok {
match => { "message" => "^%{IPORHOST:clientip} (?:-|%{USER:ident}) (?:-|%{USER:auth}) [%{HTTPDATE:timestamp}] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|-)" %{NUMBER:response} (?:-|%{NUMBER:bytes})" }
}
}
output {
elasticsearch {
index => "emailtest"
document_type => "email"
hosts => "localhost:9200"
}

}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.