Parsing Emails with logstash

Hi, everyone! I'm newest in ELK, so I need help =)
Could some one help me?
I try to search some data from my email message. I used imap input plugin , but what should I use in filter?

For example if i use rubydebug in stdout I see that :
"message-id" => "",
"from" => "",
"x-ms-exchange-organization-authas" => "Anonymous",
"x-ms-exchange-organization-authsource" => "",
"date" => "",
"@version" => "1",
"@timestamp" => ,
"to" => "",
"subject" => "Alert Summary: ",
"received" => "",
"content-type" => "text/html; charset=UTF-8",
"mime-version" => "1.0",
"message" => "SOME HTML TAGS",
"return-path" => "",
"content-transfer-encoding" => "7bit",
"type" => "new_type"

And now I want if I see something in "subject" Then I must to parse HTML tags. How can I do this?

My config file looks like :

input
{
imap
{
type => "new_type"
host => ""
user => "user"
password => "pass"
port => 143
secure => false
check_interval => 360
delete => false
folder=> "Inbox/Folder"
}
}

filter
{
if [type] == "new_type"
{

    }

}

output
{
if [type] == "new_type"
{
elasticsearch
{
hosts => ["host:9200"]
index => "new_type_index"
}
}
}

The subject field doesn't contain any HTML tags so I don't understand the question. Are you talking about the message body?

Hi, Magnus! Thanks for reply!
Yes, I'm talking about message body.

Logstash has no built-in filter for HTML stripping. While you might be able to ignore the fact that it's HTML and just treat it as plain text (it depends on what you want to do) I'd probably write a custom filter plugin for stripping the HTML markup from the field.

Did I understadn correct, I must write some regex if I want extract some data from message? Or I can use something else? Cause it will be a very huge regex =)
Could you, please, write some examle of config file in this case?

Did I understadn correct, I must write some regex if I want extract some data from message?

It depends on what the HTML looks like and what you want to extract from it.

ohhh, ok,
thanks for the quick answers! I will read how to write my own plugins. I like the idea of deleting HTML tags and trying to parse the text itself. I hope it will not be very difficult....

Magnus! Can you help me, again ? =)
I'm stuck again. How can I parse the text now? I did so that all key words are separated by a comma. And I know for sure that all words will always be in the same order. How do I now pull out certain keywords and assign them to other fields?
For example :

"Some text ,some text,INEED-THIS-ONE,next text,AND-I-ALSO-NEED-THIS "

You have several options: a csv or dissect filter would probably be the easiest.

Magnus! Thanks for help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.