Hi,
in my DNS logs I have a lot of very long URLs, for example those from CDNs. They can have one or more levels of subdomains, and I am looking to remove these.
For example:
safebrowsing.googleapis.com > googleapis.com
img-prod.pocket.prod.cloudops.mozgcp.net > mozgcp.net
And then there are TLDs like co.uk with their own dot.
Splitting by the dot was one thing I tried that didn't work out because then I have different amounts of fields to be looked at (and I have no idea how to handle this in Logstash).
I found this solution describing a regex approach to identify TLDs: Regex to extract the top level domain from a URL - Stack Overflow but it doesn't help with the co.uk example. Also I don't find a reference how to actually apply a regex to a string.
Can you suggest an elegant solution for this?
Thank you!