Another custom log parsing


(Konstantin Kondakov) #1

I have slightly modified Apache log -

LogFormat "%h %l %u %t %{Host}i "%r" %>s %b "%{Referer}i" "%{User-Agent}i" "%{nsdr}C"" combined
LogFormat "%h %l %u %t "%r" "%{Cookie}i" %>s %b" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent
CustomLog logs/access_log combined

That looks like the following message:

192.168.10.200 - - [08/May/2018:12:46:28 -0700] www.computerworld.com "GET /autocompleter?callback=jQuery110207036132516408653_1525808774720&featureClass=P&maxRows=12&query=Learning+R&style=full HTTP/1.1" 200 107 "https://www.computerworld.com/category/application-development/" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36" "-"

My logstash input filter is:

filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG},%{QUOTEDSTRING:NSDR}" }
}

What am I missing?


(Magnus Bäck) #2

You seem to be missing that the separator between the user agent and the NSDR is a space and not a comma. There might be other problems too but that one stood out.


(Konstantin Kondakov) #3

replaced with space - no luck.. :-(((

input {
file {
path => '/weblogs/**/.'
}
}

filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG} %{QUOTEDSTRING:NSDR}" }
}

What is the best way to troubleshoot issues like thsis?


(Magnus Bäck) #4

What is the best way to troubleshoot issues like thsis?

Build the expression gradually. Start with the simplest possible expression and verify that it works, then add the next component, and the next. At some point it'll break and then you've narrowed the problem down. In this case you'll have to expand COMBINEDAPACHELOG to its constituent parts.


(Christian Dahlqvist) #5

You have a field that contains www.computerworld.com right after the timestamp. In the COMBINEDAPACHELOG pattern the verb follows right after the timestamp. You suspect you will therefore need to define your own pattern.


(Konstantin Kondakov) #6

Found the following sample at https://github.com/elastic/examples/tree/master/Common%20Data%20Formats/apache_logs

Using the following syntax for Grok Parser:

%{IPORHOST:clientip} %{USER:ident} %{USER:auth} [%{HTTPDATE:timestamp}] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:agent}

the example 1 (from the text book) - WORKS

83.149.9.216 - - [17/May/2015:10:05:03 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1" 200 203023 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"

the example 2 -- MY LOG - that looks almost the SAME -does NOT work!!!

172.22.70.194 - - [09/May/2018:20:19:35 -0700] "GET /network HTTP/1.1" 200 576 "http://fastlyssl.pcworld.com/article/3269793/components-graphics/nvidia-kills-geforce-partner-program.html" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36"


(Konstantin Kondakov) #7

FIXED!!! - there was an EXTRA space in my Apachecombine log between directives - and here is the FINAL version of the grok filter -

filter {
grok {
match => {
"message" => '%{IPORHOST:clientip} %{USER::ident} %{USER:auth} [%{HTTPDATE:timestamp}] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:agent} %{QS:nsdr} %{QS:property}'
}
}


(Pjanzen) #8

I do not know if you are aware but there is a online grok debugger which is really handy for this type of problems.


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.