Fix grok pattern

Hello, I've a logfile which contains a line like this:
2017-05-01 07:41:25 [scraper.py] DEBUG: Scraped from <200 https://www.diwanegypt.com/>{'category_ar': u'biographies'}

I want to extract the url into a variable called scraped url from this line, I tried this regex (?<=<).*(?=>) but it didn't work, any ideas?

I also tried this %{TIMESTAMP_ISO8601} %{NOTSPACE} DEBUG: Scraped from <200 %{GREEDYDATA:scraped_url}
output:

"scraped_url":
[
"https://www.diwanegypt.com/>{'category_ar': u'biographies'}"
]

Please show the result of a stdout { codec => rubydebug } output so we can see exactly what the result is.

1 Like

You could try something like:

if [message] =~ /.*Scraped\sfrom.*/ {
    grok {
        match => { "message" => ".*\<\d{3}\s(?<scraped_url>.*(?=\/\>))" }
    }
}
1 Like

Thank you guys, @Kryten's solution is working.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.