in my DNS logs I have a lot of very long URLs, for example those from CDNs. They can have one or more levels of subdomains, and I am looking to remove these.
safebrowsing.googleapis.com > googleapis.com
img-prod.pocket.prod.cloudops.mozgcp.net > mozgcp.net
And then there are TLDs like co.uk with their own dot.
Splitting by the dot was one thing I tried that didn't work out because then I have different amounts of fields to be looked at (and I have no idea how to handle this in Logstash).
I found this solution describing a regex approach to identify TLDs: Regex to extract the top level domain from a URL - Stack Overflow but it doesn't help with the co.uk example. Also I don't find a reference how to actually apply a regex to a string.
Can you suggest an elegant solution for this?
No. See my discussion of this here. In .uk ,or .dz (Algeria), and other TLDs, some second level domain names are names, and some are labels under which domain names are assigned by the registrar (like co.uk). But if I wanted to register badger2022.uk then Nominet would show me dozens of registrars willing to host that as a second level domain. I could then host DNS servers for that and create foo.bar.badger2022.uk under it.
Trying to track the policy of hundreds of TLD registrars about name vs. label categories for their second level domains is never going to be elegant.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.