Fix grok pattern

ibrahimsharaf · June 14, 2017, 12:32am

Hello, I've a logfile which contains a line like this:
2017-05-01 07:41:25 [scraper.py] DEBUG: Scraped from <200 https://www.diwanegypt.com/>{'category_ar': u'biographies'}

I want to extract the url into a variable called scraped url from this line, I tried this regex (?<=<).*(?=>) but it didn't work, any ideas?

ibrahimsharaf · June 14, 2017, 12:35am

I also tried this %{TIMESTAMP_ISO8601} %{NOTSPACE} DEBUG: Scraped from <200 %{GREEDYDATA:scraped_url}
output:

"scraped_url":
[
"https://www.diwanegypt.com/>{'category_ar': u'biographies'}"
]

magnusbaeck · June 14, 2017, 5:38am

Please show the result of a stdout { codec => rubydebug } output so we can see exactly what the result is.

Kryten · June 14, 2017, 7:27am

You could try something like:

if [message] =~ /.*Scraped\sfrom.*/ {
    grok {
        match => { "message" => ".*\<\d{3}\s(?<scraped_url>.*(?=\/\>))" }
    }
}

ibrahimsharaf · June 14, 2017, 10:44am

Thank you guys, @Kryten's solution is working.

system · July 12, 2017, 10:44am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.