Grok pattern failing domain.com vs http://domain.com/

I am relatively new to ELK and have gotten pretty far setting up my custom grok pattern for apache logs. I have them slightly customized. I have it 90% of the way there but I'm seeing a failure when the log line has "domain.com" vs "https://domain.com". When the log has the former it fails, but it works fine for the latter. I've had issues in the past where www.domain.com and domain.com were being treated as unequal and I am trying to avoid that in my latest build.

Log lines. First one fails, second one matches.

12.34.56.78 - - [09/Oct/2017:22:23:00 +0000] domain.com "POST /yt.php HTTP/1.1" 301 240 "domain.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36" Server=aws8 "-" 103224 0
12.34.56.78 - - [09/Oct/2017:22:24:45 +0000] domain2.com "GET /images/icons/32-twitter.png HTTP/1.1" 200 462 "http://domain2.com/programs/coaching.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 11_0_2 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A421 Safari/604.1" Server=aws8 "-" 744 0

Current grok.

%{IPORHOST:clientip} - - [%{HTTPDATE:timestamp}] (?:%{IPORHOST:virtualhost}|-) "(?:%{WORD:request_type} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}|-)" %{NUMBER:response} (?:%{NUMBER:bytes}|-) "(?:%{URI:referrer}|-)" %{QS:agent} Server=(?:%{WORD:server}|-) "(?:-|%{NOTSPACE:ssl_protocol}|-)" %{BASE10NUM:request_duration_ms} %{BASE10NUM:request_duration_s}

It appears to be a double-quoted string just like the user agent so why not use the QS pattern?

I'm not following?

Use the QS pattern to match the domain name, i.e. replace "(?:%{URI:referrer}|-)" with %{QS:referrer}.

So far so good. Now I'm seeing some log lines with the port included after the virtualhost "domain.com:80". Thoughts?

163.172.4.153 - - [10/Oct/2017:15:29:39 +0000] domain.com:80 "GET /o-neal-drywall-construction-coming-soon.html HTTP/1.1" 200 3369 "http://domain.com:80/coming-soon.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36" Server=aws8 "-" 269118 0

(?:%{IPORHOST:virtualhost}|%{HOSTPORT:virtualhost}|-) perhaps

Works, thanks!

Latest:
%{IPORHOST:clientip} - - [%{HTTPDATE:timestamp}] (?:%{IPORHOST:virtualhost}|%{HOSTPORT:virtualhost}|-) "(?:%{WORD:request_type} (?:%{URIPATHPARAM:request}|*) HTTP/%{NUMBER:httpversion}|-)" %{NUMBER:response} (?:%{NUMBER:bytes}|-) (?:%{QS:referrer}|-)\ %{QS:agent} Server=(?:%{WORD:server}|-) "(?:-|%{NOTSPACE:ssl_protocol}|-)" %{BASE10NUM:request_duration_ms} %{BASE10NUM:request_duration_s}

This is failing at HEAD (I think!).

92.239.124.14 - - [10/Oct/2017:18:02:24 +0000] 98.76.54.31:80 "HEAD http://98.76.54.3:80/mysql/mysqlmanager/ HTTP/1.1" 404 - "-" "Mozilla/5.0 Jorgee" Server=aws7 "-" 288 0

Capture5

It's failing just after HEAD. 'http://98.76.54.3:80/mysql/mysqlmanager/' does not match (?:%{URIPATHPARAM:request}|*), which I do not even think is a valid grok pattern. You really need to make sure what is displayed in the post matches the actual pattern (note the backslash on the square brackets in my version). Use the </> (preformatted text) button.

%{IPORHOST:clientip} - - \[%{HTTPDATE:timestamp}\] (?:%{IPORHOST:virtualhost}|%{HOSTPORT:virtualhost}|-) "(?:%{WORD:request_type} (?:%{URIPATHPARAM:request}|.*) HTTP/%{NUMBER:httpversion}|-)" %{NUMBER:response} (?:%{NUMBER:bytes}|-) (?:%{QS:referrer}|-)\ %{QS:agent} Server=(?:%{WORD:server}|-) "(?:-|%{NOTSPACE:ssl_protocol}|-)" %{BASE10NUM:request_duration_ms} %{BASE10NUM:request_duration_s}

This matches, but does not capture a named request. You could give that .* a name using

%{IPORHOST:clientip} - - \[%{HTTPDATE:timestamp}\] (?:%{IPORHOST:virtualhost}|%{HOSTPORT:virtualhost}|-) "(?:%{WORD:request_type} (?:%{URIPATHPARAM:request}|(?<request>.*)) HTTP/%{NUMBER:httpversion}|-)" %{NUMBER:response} (?:%{NUMBER:bytes}|-) (?:%{QS:referrer}|-)\ %{QS:agent} Server=(?:%{WORD:server}|-) "(?:-|%{NOTSPACE:ssl_protocol}|-)" %{BASE10NUM:request_duration_ms} %{BASE10NUM:request_duration_s}

It is not very pretty, but it does match.

You have been very helpful, thank you. I'm down to less than 2% of my log lines in the last 24 hours having a parse failure, so its getting better! The next one I need to tackle is below. It always shows with a 408 request timeout which does not complete the log line and throws an error. I'd like to keep the data so that I can have reports on all error types. How can I get my grok to fill in as much data that is included even if it leaves off the trailing options?

xx.xx.xx.xx - - [11/Oct/2017:11:50:00 +0000] "-" 408 -

Either use a completely different grok expression (a grok filter can list multiple expressions that will be tried in order) or you could make everything after "408 -" optional with (...)?.

Is that done like below with multiple grok match?

grok {
match => { 'message' => ''}
match => { 'message' => ''}
}

I'm not sure if that syntax works. There's an example of the supported syntax in the grok filter documentation.

Can you point me to the right place?

Search for "If you need to match multiple patterns".

https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.