Grok Multiline file

Hi there,

I'm looking to grok mails which were not sent. So i would like the entire file to be read as one log with 5 fields : From, To, Date, Subject and the rest as Message.

Here is what it looks like:

From: TEXT
To: TEXT
Date: Thu, 30 Jul 2020 19:00:25 +0200
TEXT
TEXT
	TEXT
TEXT
TEXT
Subject: TEXT
TEXT
TEXT
	TEXT
etc...

How can i write my grok to get those FROM field, then TO field then DATE field, then between DATE and SUBJECT, we put everything in MESSAGE, then SUBJECT then again in MESSAGE

I'm using a pipeline which has 10 filters depending on a field i add in filebeat.yml. So my first question is : can i use codec multiline with a condition in the input of my pipeline ?

Right now it looks like

  beats {
    port => 5047
	client_inactivity_timeout => 1200
  }
}

and i would like to do this

  beats {
    port => 5047
	client_inactivity_timeout => 1200
  }
  if [fields][log_type] == "bad-mail" {   
    codec => multiline {
          pattern => "From:%{DATA:from} To:%{DATA:to} Date:%{DATA:date} %{GREEDYDATA:message} Subject:%{Subject:from} %{GREEDYDATA:message}"
          negate => "false"
      }
    }
}

To use multiline only with this log_type, i don't know if i can put \n or else.

I've looked to a lot of example or others post on this website and others, but i didn't find the solution yet :confused: , i m still testing all the filters either in filebeat or in logstash.

Thanks already for your time and your help, have a great day,

Louis Vince.

No, you cannot. The codec is used to generate events, so it cannot be conditional on events which only exist after the codec has been applied.

Ok thanks for your answer @Badger, so i have to put the multiline.pattern and others settings into my filebeat.yml.
But i can't find any patterns matching what i need. I can get the line where there are the terms "From: To: Subject: and Date:" but i don't know how to write "take the rest of the document into a Message field". The number of line and the content will be different for every mail.

Moreover, the mail can be a response to another mail.. so it can be fullfilled with:

From:
To:
Date:
TExt ....
Subject:
text
text 
...
... 
         from:
         to:
         etc....

Those fields can even be at the same level of indentation... but i only want to get "From: To: Subject: and Date:" fields of the top of the mail. I can understand and i don't want u to do the regex filter, but i only want to know if it's possible :smiley:

Thanks again for your time, best regards,

Louis Vince

If you are saying that in the text

From:
To:
Date:
TExt ....
Subject:
text
text 
From:
To:
Date:
TExt ....
Subject:
text
text 

that the second From is sometimes a new mail message and sometimes a quoted message then no, I see no way for logstash or filebeat to determine which it is.

No no, one file is always one log, sometimes in the mail, there is a quoted mail, but i dont want to get this quoted mail as a log.

i've changed my grok debugger and it seems to work, i will test it next monday.

But my Message field isnt a multiline field so it"s not very clean to read when there are 40 lines with line 1 \n line2 \n line 3...

I would like the \n to be applied when i display the data in a table or in the discover interface in Kibana.

From: %{GREEDYDATA:From}\nTo: %{GREEDYDATA:To}\nDate: %{GREEDYDATA:Date}\n(?m)%{GREEDYDATA:Message}\nMessage-ID: %{DATA:MessageID}\nSubject: %{DATA:Subject}\n(?m)%{GREEDYDATA:Message}\n

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.