Grok for email log


(Vikash Singh) #1

Can you please help me with building a grok pattern for SMTP log:
Example of my smtp log:

Return-Path: alok@XXXXXX.in
Delivered-To: bigbrother@XXXX.in
Received: from n-mail-1.xxxxx.in
by n-mail-1.xxxxx.inwith LMTP id uOV8CFLmalxdRQAADPfE8A
for bigbrother@xxxxx.in; Mon, 18 Feb 2019 22:37:30 +0530
Received: from Spamfilter-3.xxxx.in (Spamfilter-3.rrcat.gov.in [10.11.108.104])
by n-mail-1.rrcat.gov.in (Postfix) with ESMTP id 1B245263926;
Mon, 18 Feb 2019 22:37:30 +0530 (IST)
Received: from spamfilter-3.xxxx.in (localhost [127.0.0.1])
by Spamfilter-3.xxxx.in (Postfix) with ESMTP id 04FE12DA1D2;
Mon, 18 Feb 2019 22:37:30 +0530 (IST)
X-Virus-Scanned: amavisd-new at rrcat.xxxx.in
X-Spam-Flag: NO
X-Spam-Score: 1.163
X-Spam-Level: *
X-Spam-Status: No, score=1.163 tagged_above=-999 required=6
tests=[ALL_TRUSTED=-1, DEAR_SOMETHING=1.731, INVALID_DATE=0.432]
autolearn=no autolearn_force=no
Received: from Spamfilter-3.xxxx.in ([127.0.0.1])
by spamfilter-3.xxxx.in (spamfilter-3.xxxx.in [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id bYziy9Hx1yoW; Mon, 18 Feb 2019 22:37:29 +0530 (IST)
Received: from localhost (localhost [127.0.0.1])
by Spamfilter-3.rrcat.gov.in (Postfix) with ESMTP id 900F22D6BD6;
Mon, 18 Feb 2019 22:37:29 +0530 (IST)
From: noreply@xxxx.in
To: mkonline1996@gmail.com
Date: Mon, 18 Feb 19 17:05:17 +0000
Subject: Mail from Trade Apprenticeship Scheme at xxxxx (TASAR-2019)
Message-Id: 20190218170729.900F22D6BD6@Spamfilter-3.xxxx.in

Dear Candidate,

Your password for Online Application Submission for the Apprenticeship Program of trade Electrician against
Advertisement No.xxxxxxxx and trade code I-8 is japupu3ys

Login name is same as your Email ID.


This is system generated mail, please do not reply to it.


#2

For each header that you want to extract use a pattern that matches zero-or-more characters that are not newline followed by a newline.

    grok { match => { message => "^Header: (?<Header>[^
]*)
" } }

Then you can give the grok filter an array of patterns to match.


(Vikash Singh) #3

Please elaborate little bit more.


#4

I assume that you consume the entire log as a single event using multiline codec. If you are consuming it one line at a time things will be a little different. Decide which headers you want to parse and use the pattern I described to extract them. For example

    grok {
        break_on_match => false
        match => {
            message => [
                "^Date: (?<Date>[^
]*)
",
                "^Subject: (?<Subject>[^
]*)
",
                "^X-Spam-Level: (?<X-Spam-Level>[^
]*)
",
                "^X-Spam-Status: (?<X-Spam-Status>[^
]*)
"
            ]
        }
    }

Once you have extracted them you may require additional parsing to extract fields from those headers.

For Received you cannot do it using grok because there are multiple occurences, however, it is simple to do a similar regex match in ruby

    ruby {
        code => 'event.set("Received", event.get("message").scan(/^Received: ([^
]+
)/).flatten)'
    }

That will set Received to an array of 5 strings which contain the contents of the Received headers.


(Vikash Singh) #5

actually I am newborn for elk and I am still unable to understand how to use multiline filter


#6

How are you ingesting the log?


(Vikash Singh) #7

via FileBeat


#8

Do you want to ingest the entire file as a single event?


(Vikash Singh) #9

Yes!!


#10

If filebeat works the same way as a multiline codec, then you should be able to do that by specifying a pattern that never matches

multiline.pattern: '^Spalanzani'
multiline.negate: true
multiline.match: after