Extract domain with grok

Badger · December 16, 2021, 5:31pm

Certainly, but that is not what you want. If the name is within .co.uk you want to include the third level name. For some TLDs there is a set of second level names used for labelling, but they also accept registrations at the second level. So the regexp is going to get complicated. I think this handles most of .uk and .dz (Algeria) correctly, you would have to expand it to include other countries.

"(?<domainName>(((([^.]+\.(art|asso|com|edu|gov|net|org|pol|tm|soc)))|[^.]+)\.dz|((([^.]+\.(ac|co|gov|judiciary|ltd|me|mod|net|nhs|nic|org|parliament|plc|police|royal|sch)))|[^.]+)\.uk|([^.]+\.[^.]+)))$"

That will match "windsor.royal.uk" from "test.windsor.royal.uk", but "blah.uk" from "test.windsor.blah.uk". This Wikipedia page has links to pages for the 2nd level labels by several country level domains.

By using [^.]+ for names it has at least a chance of working with i18n DNS. For example, for ουτοπία.δπθ.gr is OK, it matches δπθ.gr

Overall I think trying to build a regexp that encapsulates the policies of dozens of different registrars is a losing game. A slightly less bad approach would be to use multiple groks...

if [someField] =~ /(\.(art|asso|com|edu|gov|net|org|pol|tm|soc).dz|(ac|co|gov|judiciary|ltd|me|mod|net|nhs|nic|org|parliament|plc|police|royal|sch)\.uk)$/ {
    grok { match => { "someField" => "[^.]+\.[^.]+\.[^.]+$" } }
} else {
    grok { match => { "someField" => "[^.]+\.[^.]+$" } }
}

I cannot think of a good solution.

Topic		Replies	Views
Split field in elastic Logstash	7	286	July 15, 2021
Keep only domain.tld in field with various subdomains Logstash	2	509	April 27, 2022
How to split DNS request and retrive the domain Logstash	2	766	September 14, 2017
Extract domain with grok Logstash	2	1918	November 7, 2017
Parse subdomain from fqdn in logstash Logstash	7	746	June 21, 2021

Extract domain with grok

Related topics