Extract domain with grok

Hello!
I have the following logs, with field "destination" looks like one of those:

s03.amazon.com
a63.google.com
ayndex.cc
s23.a4.test.com
test.google.co.uk

Is it possible by Grok filter in Logstash, or maybe other way, to split field to grab only 2nd level domain? From previous example I need only:

amazon.com
google.com
ayndex.cc
test.com
google.co.uk

Big thanks for any help!

Certainly, but that is not what you want. If the name is within .co.uk you want to include the third level name. For some TLDs there is a set of second level names used for labelling, but they also accept registrations at the second level. So the regexp is going to get complicated. I think this handles most of .uk and .dz (Algeria) correctly, you would have to expand it to include other countries.

"(?<domainName>(((([^.]+\.(art|asso|com|edu|gov|net|org|pol|tm|soc)))|[^.]+)\.dz|((([^.]+\.(ac|co|gov|judiciary|ltd|me|mod|net|nhs|nic|org|parliament|plc|police|royal|sch)))|[^.]+)\.uk|([^.]+\.[^.]+)))$"

That will match "windsor.royal.uk" from "test.windsor.royal.uk", but "blah.uk" from "test.windsor.blah.uk". This Wikipedia page has links to pages for the 2nd level labels by several country level domains.

By using [^.]+ for names it has at least a chance of working with i18n DNS. For example, for ουτοπία.δπθ.gr is OK, it matches δπθ.gr

Overall I think trying to build a regexp that encapsulates the policies of dozens of different registrars is a losing game. A slightly less bad approach would be to use multiple groks...

if [someField] =~ /(\.(art|asso|com|edu|gov|net|org|pol|tm|soc).dz|(ac|co|gov|judiciary|ltd|me|mod|net|nhs|nic|org|parliament|plc|police|royal|sch)\.uk)$/ {
    grok { match => { "someField" => "[^.]+\.[^.]+\.[^.]+$" } }
} else {
    grok { match => { "someField" => "[^.]+\.[^.]+$" } }
}

I cannot think of a good solution.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.