I tried many logics with grok etc. but of no use. Thing is I can get a random domain name, no clue with how many subdomains in it. I need to have only last 2 in a field.
This grok pattern give you the result you want : %{WORD:name}.%{WORD:TLD}$
With this pattern, you sai you want two words separated by a dot and they are placed at the end of your field (with the character $)
You have to be careful with this because country tlds have an extra characters on the front of a domain. So qbc.co becomes qbc.co.uk and this logic would only return co.uk. I don't think logstash has a registered domain processor but filebeat and elasticsearch do. See Registered Domain | Filebeat Reference [7.12] | Elastic.
The 'tld' filter exists precisely for this purpose. To do this correctly is very non-trivial and cannot be done algorithmically; you need to use the 'public suffix' data, which is what the 'tld' plugin (and others like it) use.
Cheers,
Cameron
(PS. In case you come across a plugin I wrote called logstash-filter-dnssummary, I would suggest you stick with the 'tld' plugin, just because its more maintained; unless perhaps you care about Unicode and IDNA)
Thanks all for tld suggestion. Thing is, my logstash server does not have access to internet. And tld filter needs to be manually installed. It does not come with the bundle. Any suggestions?
Or if you only lack direct access to the internet, but can still go through a HTTP proxy, you could use it via a proxy (port 3128 is the default port for a Squid proxy; other proxies vary)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.