Logstash grok / dissect filter over array


#1

Hello,
we have a log line that we want to parse with Logstash that looks like this:

from=<noreply@xxx.de> to=<max.muster@abc.de> to=<fabian.muster@def.de> to=<hamid.muster@ghi.de>

To parse this line, we use the Logstash key-value Filter:

kv {
    source => "postfix_keyvalue_data"
    trim_value => "<>,"
    prefix => "postfix_"
    remove_field => [ "postfix_keyvalue_data" ]
}

The result in Elasticsearch is what we expect:

"postfix_to": [
  "max.muster@abc.de",
  "fabian.muster@def.de",
  "hamid.muster@ghi.de"
],
"postfix_from": "noreply@xxx.de"

Now we want in a separate field from every e-mail with only the domain.
The following should be added to the document:

"postfix_to_domain": [
  "abc.de",
  "def.de",
  "ghi.de"
],
"postfix_from_domain": "xxx.de"

We've tried this to implement with the dissect filter:

dissect {
  mapping => {
    "postfix_from" => "%{}@%{postfix_from_domain}"
	"postfix_to" => "%{}@%{postfix_to_domain}"
  }
}

But the dissect filter doesn't iterate over the array. The result looks horrible:

"postfix_to_domain": "abc.de\", \"fabian.muster@def.de\", \"hamid.muster@ghi.de\"]",
"postfix_from_domain": "xxx.de"

Is it possible to iterate with the dissect filter or with grok or something else over an array and treat each value individually?

Thank you
Joel


(Makara) #2

If the number of to and from fields in the log lines remains same, then a simple grok can fetch the desired values without using KV filter.

from=<%{DATA:from_address}@%{DATA:from_domain}> to=<%{DATA:to_address1}@%{DATA:to_domain1}> to=<%{DATA:to_address2}@%{DATA:to_domain2}> to=<%{DATA:to_address3}@%{DATA:to_domain3}>


#3

Thank you for your reply.

That's the problem, the number of "from" or "to" fields remains not the same.
There is always at least one "form" and one "to" field, but there can also be several more. So I can't use a numbered field like "to_domain1", "to_domain2", "to_domain3", ... It need to be an array.

Can I use the dissect filter to iterate over an array? A for each loop around the dissect?
Or is my approach for this problem completely wrong?


(Makara) #4

@ftds
I am not too much familiar with dissect filter but since you have variable number of to and from fields, you can very easily separate the domain names from the field postfix_to and add it to the document using a ruby filter.


#5

I solved the problem with a custom ruby filter.
Thank you for this tip.

ruby {
  code => '
  postfix_from = []
  postfix_from_domain = []
  postfix_from.push(event.get("postfix_from")).flatten!
  postfix_from.each do |i|
    if i =~ /\@.+/
      postfix_from_domain.push(i.split("@")[1])
    end
  end
  if postfix_from_domain.count > 0
    if postfix_from_domain.count == 1 then postfix_from_domain = postfix_from_domain[0] end
    event.set("postfix_from_domain", postfix_from_domain)
  end
  '
}

(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.