Logstash filter: find all occurrences and put in array

Hello,
For example we have some log string:
2019-01-08 01:00:10 INFO Emails were sent from some@domain.com to a@domain.com, b@domain.com. The topic is cool. c@domain.com is ignored.
Is there a way to find and put all emails to an array via some filter?
Expected result is:
"emails": ["some@domain.com", "a@domain.com", "b@domain.com", "c@domain.com"]

If someone held a gun to my head I would implement this using

ruby {
    code => '
        a = event.get("message").split(" ").keep_if { |x| x.include? ("@") }
        unless a.nil?
            a = a.collect { |x| x.gsub(/[,\.]$/, "") }
            event.set("emails", a)
        end
    '
}

Error handling left as an exercise for the reader.

We implemented a similar process to this for caputring emails within our helpdesk tickets.

GROK match:
You can certainly do all this in Ruby if you would like.....
NOTE: make sure to set break_on_match => false to capture all occurrences in the field

match => { "notes" => ["%{EMAILADDRESS:email_list}"] }

Pattern for RFC 5322 Official Standard (99% accurate for us):

EMAILADDRESS (?:[a-zA-Z0-9!#$%&'+/=?^_{|}~-]+(?:\.[a-zA-Z0-9!#$%&'*+\/=?^_{|}~-]+)|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\[\x01-\x09\x0b\x0c\x0e-\x7f])")@(?:(?:a-zA-Z0-9?.)+a-zA-Z0-9?|[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])).){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-zA-Z0-9-][a-zA-Z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\[\x01-\x09\x0b\x0c\x0e-\x7f])+)])

Ruby Code to remove duplicates:

ruby {
id => ""
code => "
emails = event.get('email_list')
# If email_list is not nil and has more than one email (is Array) then remove duplicate emails
if !emails.nil? and emails.kind_of?(Array) then event.set('email_list', emails.uniq) end
"
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.