Extract URL from Text Field into a New Field Called URL


(Isra) #1

I'm inputting a field called text. this field may at times contain a URL.What I would like to do is extract the URL's from text, and put them in a new field called URL.

I tried grok, but it seems like grok patterns need a specific log format in order for it to work. For an example, the following will work:

5546 hello www.google.com
{id} {text} {URL}

But the following wouldn't

4324 hello my name is Ryan www.yahoo.com
{id} {text} {URL}

instead, it would take hello as text, and not take www.yahoo.com as the URL. Is there a way around this? Please note that sometimes, the text might look like the following:

www.gmail.com hello everyone 

What filter can I use in order to extract the URL from the text coming into Logstash?

Thank you.


(Magnus B├Ąck) #2

The grok filter is the right tool for the job. Since the position of the URL isn't the same all the time you can't use a grok expression that extracts word N from the input text. Instead you need to construct a pattern that recognizes the URL itself, i.e. by looking for a word that starts with "www.", or perhaps any space-delimited sequence of characters that begins with something that looks like a domain name. For example,

(?<url>[a-z0-9-]+\.[a-z0-9-]+\S+)

captures into the field url anything that begins like a hostname with at least two labels ("www.google.com", "google.com" but not "com") followed by any number of non-space characters.

(Strictly speaking I believe a URL needs to begin with a scheme, like http://.)


(system) #3