Is it possible to extract the domain only from a URL using lostash:
URL:
e10.whatsapp.net
mtalk.google.com
teredo.ipv6.microsoft.com
extshort.weixin.qq.com
Domain
whatsapp.net
google.com
qq.com
Is it possible to extract the domain only from a URL using lostash:
URL:
e10.whatsapp.net
mtalk.google.com
teredo.ipv6.microsoft.com
extshort.weixin.qq.com
Domain
whatsapp.net
google.com
qq.com
How about:
filter {
if [URL] {
ruby {
begin
event['Domain'] = event['URL'].match(/^.*\.((.*?)\.(.*?))$/)[1]
end
}
}
}
Might be able to do it with grok to, but I don't see an obvious way. It might require more brain power than just using ruby.
Cheers.
Untested:
filter {
grok {
match => ["URL", "\.(?<Domain>[^.]+\.[^.]+)$"]
}
}
Splitting the string on each period, grabbing the two last elements, and joining them back together should be a lot more efficient though.
This will however take only the two last part or the url and return this. How do you then handle the other domains ending with more than one criteria e.g.
tachiarai.fukuoka.jp
blogspot.co.uk
wa.edu.au
ap-southeast-2.compute.amazonaws.com
I have found a very interesting site that updates all the domains on the internet on a regular bases, here is the link: https://publicsuffix.org/list/public_suffix_list.dat
Is it possible to use this file as a reference for the domains and then just add one more -1 to add the actual name of the site?
This will however take only the two last part or the url and return this.
Yes, of course. It wasn't obvious what you meant by domain.
I have found a very interesting site that updates all the domains on the internet on a regular bases, here is the link: https://publicsuffix.org/list/public_suffix_list.dat
Is it possible to use this file as a reference for the domains and then just add one more -1 to add the actual name of the site?
Sure, a custom filter plugin could easily do that.
Could you kindly elaborate, how to go about doing this?
Have a look at this page:
https://www.elastic.co/guide/en/logstash/current/_how_to_write_a_logstash_filter_plugin.html
Thank you for the link, is there a forum also for ruby coding to seek assistance when getting stuck?
Ruby is pretty straight-forward. Most of the folks here can probably provide you with assistance. Just read the documentation.
Thank you, will try and find a starting point somewhere.
Hi All, a quick question, I am very new to the Plugin writing and Ruby, would it be possible for someone to provide some guidance. After going through the documentation and information it is still quite overwhelming. So what I would like to do a take the url and extract only the domain from this by comparing the public_suffix_list.dat (https://publicsuffix.org/list/public_suffix_list.dat) list to the url and minus one field. So if there is a url with www.bbc.co.uk comparing this to the public_suffix_list.dat file a hist would be on .co.uk then minus one to get bbc.co.uk.
Any assistance in getting started would be appreciated.
Look at the second answer from here:
Here's the lib they reference:
Looks fairly straightforward, hopefully it works out for you.
Thank you very much Robt, I will try this.
So the grok filter posted doesn't work for all domains?
I have a 'referer' entry in my nginx log (eg "https://abc.storage.googleapis.com/app/desktop.html?a=1) that I want to extract just the domain from, excluding the protocol. From the confusing grok patterns I have this:
match => { "referer" => "%{HOST:referer_domain}" }
But I see in the logstash log that this fails. So, please please how do I do this using a grok pattern?
match => { "referer" => "%{HOST:referer_domain}" }
Try this:
match => { "referer" => "%{URIPROTO}://%{URIHOST:referer_domain}" }
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.