Issues with grok parsing logs that vary


(Dave) #1

Hello,

I'm having issues with my logstash grok parsing. I wish the logs were more uniform but since we have a proxy in place, I can expect from 1-4 ip addresses in the apache log line. Also what makes it difficult is the varying number of spaces between them. I could have one line with ip,ip,ip and another line with ip, ip, ip, or ip, ip,ip . The other issue I'm having is with the backslashes outside of the quotation marks. On a minority of the log entries the Quoted Strings have a backslash before them. So I'm hoping there is a tricky way to implement a parse string to handle all the various situations that might occur with the various spaces, ips, and backslashes. I'm using multiple parsing lines to describe all the various issues, but its starting to get a bit cumbersome. Also I tried using the NOTSPACE filter for the backslash but I don't think thats working. Also the grok filters cover 99.9% of my logs, its just the .1% I'm trying to get fixed up now.

My grok filters:

"%{URIHOST:ServerName} (?:%{URIHOST:OriginIP}|-) (?:%{USER:ident}|-) (?:%{USER:auth}|-) \[%{HTTPDATE:DateOfRequest}\] %{QS:RequestFirstLine} %{POSINT:HTTPStatus} (?:%{NUMBER:bytes}|-) %{QS:Referrer} %{QS:UserAgent}",
                  "%{URIHOST:ServerName} (?:%{URIHOST:OriginIP}|-), (?:%{URIHOST}|-) (?:%{USER:ident}|-) (?:%{USER:auth}|-) \[%{HTTPDATE:DateOfRequest}\] %{QS:RequestFirstLine} %{POSINT:HTTPStatus} (?:%{NUMBER:bytes}|-) %{QS:Referrer} %{QS:UserAgent}",
                  "%{URIHOST:ServerName} (?:%{URIHOST:OriginIP}|-), (?:%{URIHOST}|-), (?:%{URIHOST}|-) (?:%{USER:ident}|-) (?:%{USER:auth}|-) \[%{HTTPDATE:DateOfRequest}\] %{QS:RequestFirstLine} %{POSINT:HTTPStatus} (?:%{NUMBER:bytes}|-) %{QS:Referrer} %{QS:UserAgent}",
                   "%{URIHOST:ServerName} (?:%{URIHOST:OriginIP}|-), (?:%{URIHOST}|-) (?:%{USER:ident}|-) (?:%{USER:auth}|-) \[%{HTTPDATE:DateOfRequest}\] %{NOTSPACE}%{QS:RequestFirstLine} %{POSINT:HTTPStatus} (?:%{NUMBER:bytes}|-) %{NOTSPACE}%{QS:Referrer} %{NOTSPACE}%{QS:UserAgent}"

An example log entry:

www.abc.cde.tv 11.18.0.30,12.66.55.44 - - [06/Jul/2015:13:49:23 -0600] "GET /hhh/bbb/diamond_bullet.gif HTTP/1.1" 304 - "http://www.tv.abc.tvl/abc" "Mozilla/5.0 (Linux; Android 4.4.4; en-us; SAMSUNG-SGH-I337 Build/KTU84P) AppleWebKit/537.36 (KHTML, like Gecko) Version/1.5 Chrome/28.0.1500.94 Mobile Safari/537.36"

or

www.abc.cde.tv 11.18.0.30, 12.66.55.44 - - [06/Jul/2015:13:49:23 -0600] "GET /hhh/bbb/diamond_bullet.gif HTTP/1.1" 304 - "http://www.tv.abc.tvl/abc" "Mozilla/5.0 (Linux; Android 4.4.4; en-us; SAMSUNG-SGH-I337 Build/KTU84P) AppleWebKit/537.36 (KHTML, like Gecko) Version/1.5 Chrome/28.0.1500.94 Mobile Safari/537.36"

or

www.abc.cde.tv 11.18.0.30, 12.66.55.44,12.33.44.55 - - [06/Jul/2015:13:49:23 -0600] "GET /hhh/bbb/diamond_bullet.gif HTTP/1.1" 304 - "http://www.tv.abc.tvl/abc" "Mozilla/5.0 (Linux; Android 4.4.4; en-us; SAMSUNG-SGH-I337 Build/KTU84P) AppleWebKit/537.36 (KHTML, like Gecko) Version/1.5 Chrome/28.0.1500.94 Mobile Safari/537.36"

Any Help would be great.

Thank you


(Magnus Bäck) #2

I wish the logs were more uniform but since we have a proxy in place, I can expect from 1-4 ip addresses in the apache log line. Also what makes it difficult is the varying number of spaces between them. I could have one line with ip,ip,ip and another line with ip, ip, ip, or ip, ip,ip .

This should parse one or more comma-separated IP addresses and store them in an array field: %{URIHOST:OriginIP}(\s*,\s*%{URIHOST:OriginIP})*

The other issue I'm having is with the backslashes outside of the quotation marks.

Not sure what you mean by this. Do you actually have logs with \" in them instead of plain double quotes?


(Dave) #3

Thank you

Yes logs with some lines have fields with begin with " it screws up all my filters. Usually its only three fields that are affected. If you follow my example above, the fields affected are called RequestFirstLine, Referrer, and UserAgent.

It doesn't happen in all the log lines just a few, again the minority of the logs. But if there was a way to either parse a

%{QuotedString} or %{QuotedString} That would help.


(Magnus Bäck) #4

I suppose (\\)? should match zero or one backslash.


(system) #5