Parsing Emails with logstash

M_B_I · April 18, 2018, 3:35pm

Hi, everyone! I'm newest in ELK, so I need help =)
Could some one help me?
I try to search some data from my email message. I used imap input plugin , but what should I use in filter?

For example if i use rubydebug in stdout I see that :
"message-id" => "",
"from" => "",
"x-ms-exchange-organization-authas" => "Anonymous",
"x-ms-exchange-organization-authsource" => "",
"date" => "",
"@version" => "1",
"@timestamp" => ,
"to" => "",
"subject" => "Alert Summary: ",
"received" => "",
"content-type" => "text/html; charset=UTF-8",
"mime-version" => "1.0",
"message" => "SOME HTML TAGS",
"return-path" => "",
"content-transfer-encoding" => "7bit",
"type" => "new_type"

And now I want if I see something in "subject" Then I must to parse HTML tags. How can I do this?

My config file looks like :

input
{
imap
{
type => "new_type"
host => ""
user => "user"
password => "pass"
port => 143
secure => false
check_interval => 360
delete => false
folder=> "Inbox/Folder"
}
}

filter
{
if [type] == "new_type"
{

}

output
{
if [type] == "new_type"
{
elasticsearch
{
hosts => ["host:9200"]
index => "new_type_index"
}
}
}

magnusbaeck · April 19, 2018, 7:17am

The subject field doesn't contain any HTML tags so I don't understand the question. Are you talking about the message body?

M_B_I · April 19, 2018, 7:36am

Hi, Magnus! Thanks for reply!
Yes, I'm talking about message body.

magnusbaeck · April 19, 2018, 8:03am

Logstash has no built-in filter for HTML stripping. While you might be able to ignore the fact that it's HTML and just treat it as plain text (it depends on what you want to do) I'd probably write a custom filter plugin for stripping the HTML markup from the field.

M_B_I · April 19, 2018, 8:33am

Did I understadn correct, I must write some regex if I want extract some data from message? Or I can use something else? Cause it will be a very huge regex =)
Could you, please, write some examle of config file in this case?

magnusbaeck · April 19, 2018, 8:35am

Did I understadn correct, I must write some regex if I want extract some data from message?

It depends on what the HTML looks like and what you want to extract from it.

M_B_I · April 19, 2018, 8:44am

ohhh, ok,
thanks for the quick answers! I will read how to write my own plugins. I like the idea of deleting HTML tags and trying to parse the text itself. I hope it will not be very difficult....

M_B_I · April 19, 2018, 1:34pm

Magnus! Can you help me, again ? =)
I'm stuck again. How can I parse the text now? I did so that all key words are separated by a comma. And I know for sure that all words will always be in the same order. How do I now pull out certain keywords and assign them to other fields?
For example :

"Some text ,some text,INEED-THIS-ONE,next text,AND-I-ALSO-NEED-THIS "

magnusbaeck · April 19, 2018, 2:10pm

You have several options: a csv or dissect filter would probably be the easiest.

M_B_I · April 20, 2018, 12:16pm

Magnus! Thanks for help!

system · May 18, 2018, 12:16pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash IMAP input unable to parse HTML email Logstash	3	1118	January 16, 2017
Parse email message in Logstash Logstash	5	1881	July 6, 2017
Filter grok logstash email failed Logstash	4	472	June 28, 2018
Can't even parse very simple data and get _grokparsefailure as a tag Logstash	19	1072	December 12, 2018
Logstash Filter help - Newbie question Logstash	6	317	February 17, 2022

Parsing Emails with logstash

Related topics