Extracting Domain from URL

Hans · December 2, 2015, 6:56pm

Is it possible to extract the domain only from a URL using lostash:
URL:
e10.whatsapp.net
mtalk.google.com
teredo.ipv6.microsoft.com
extshort.weixin.qq.com

Domain
whatsapp.net
google.com

qq.com

vtst2412 · December 3, 2015, 5:32am

How about:

filter {
  if [URL] {
    ruby {
      begin
          event['Domain'] = event['URL'].match(/^.*\.((.*?)\.(.*?))$/)[1]
      end
    }
  }
}

Might be able to do it with grok to, but I don't see an obvious way. It might require more brain power than just using ruby.

Cheers.

magnusbaeck · December 3, 2015, 7:06am

Untested:

filter {
  grok {
    match => ["URL", "\.(?<Domain>[^.]+\.[^.]+)$"]
  }
}

Splitting the string on each period, grabbing the two last elements, and joining them back together should be a lot more efficient though.

Hans · December 3, 2015, 8:33am

This will however take only the two last part or the url and return this. How do you then handle the other domains ending with more than one criteria e.g.
tachiarai.fukuoka.jp
blogspot.co.uk
wa.edu.au
ap-southeast-2.compute.amazonaws.com

I have found a very interesting site that updates all the domains on the internet on a regular bases, here is the link: https://publicsuffix.org/list/public_suffix_list.dat
Is it possible to use this file as a reference for the domains and then just add one more -1 to add the actual name of the site?

magnusbaeck · December 3, 2015, 8:43am

This will however take only the two last part or the url and return this.

Yes, of course. It wasn't obvious what you meant by domain.

I have found a very interesting site that updates all the domains on the internet on a regular bases, here is the link: https://publicsuffix.org/list/public_suffix_list.dat
Is it possible to use this file as a reference for the domains and then just add one more -1 to add the actual name of the site?

Sure, a custom filter plugin could easily do that.

Hans · December 3, 2015, 10:36am

Could you kindly elaborate, how to go about doing this?

magnusbaeck · December 3, 2015, 10:38am

Have a look at this page:
https://www.elastic.co/guide/en/logstash/current/_how_to_write_a_logstash_filter_plugin.html

Hans · December 3, 2015, 6:48pm

Thank you for the link, is there a forum also for ruby coding to seek assistance when getting stuck?

vtst2412 · December 3, 2015, 7:12pm

Ruby is pretty straight-forward. Most of the folks here can probably provide you with assistance. Just read the documentation.

Hans · December 4, 2015, 5:59am

Thank you, will try and find a starting point somewhere.

Hans · December 7, 2015, 10:04am

Hi All, a quick question, I am very new to the Plugin writing and Ruby, would it be possible for someone to provide some guidance. After going through the documentation and information it is still quite overwhelming. So what I would like to do a take the url and extract only the domain from this by comparing the public_suffix_list.dat (https://publicsuffix.org/list/public_suffix_list.dat) list to the url and minus one field. So if there is a url with www.bbc.co.uk comparing this to the public_suffix_list.dat file a hist would be on .co.uk then minus one to get bbc.co.uk.

Any assistance in getting started would be appreciated.

RobT · December 7, 2015, 3:39pm

Look at the second answer from here:

Here's the lib they reference:

Looks fairly straightforward, hopefully it works out for you.

Hans · December 8, 2015, 2:26pm

Thank you very much Robt, I will try this.

Jeremy_Colton · April 18, 2016, 9:30am

So the grok filter posted doesn't work for all domains?

I have a 'referer' entry in my nginx log (eg "https://abc.storage.googleapis.com/app/desktop.html?a=1) that I want to extract just the domain from, excluding the protocol. From the confusing grok patterns I have this:

match => { "referer" => "%{HOST:referer_domain}" }

But I see in the logstash log that this fails. So, please please how do I do this using a grok pattern?

magnusbaeck · April 18, 2016, 5:41pm

match => { "referer" => "%{HOST:referer_domain}" }

Try this:

match => { "referer" => "%{URIPROTO}://%{URIHOST:referer_domain}" }

Topic		Replies	Views
Extract domain with grok Logstash	2	1918	November 7, 2017
Extract domains from a string of email address Logstash	3	1270	March 26, 2019
How to split DNS request and retrive the domain Logstash	2	766	September 14, 2017
Extract domain with grok Logstash	2	763	January 13, 2022
Split FQDN Logstash	3	905	March 9, 2021

Extracting Domain from URL

Related topics