When I try parsing the apache access logs with the following grok rule, it comes out with 0 grokparsefailures
match => { "message" => ["%{COMBINEDAPACHELOG}"] }
But when I try doing the same with the following grok pattern, which was found in this Elastic Documentation , I end up with more than 30% errors.
The pattern used from the above link is below:
grok {
match => { "message" => ["%{IPORHOST:[apache2][access][remote_ip]} - %{DATA:[apache2][access][user_name]} \[%{HTTPDATE:[apache2][access][time]}\] \"%{WORD:[apache2][access][method]} %{DATA:[apache2][access][url]} HTTP/%{NUMBER:[apache2][access][http_version]}\" %{NUMBER:[apache2][access][response_code]} %{NUMBER:[apache2][access][body_sent][bytes]}( \"%{DATA:[apache2][access][referrer]}\")?( \"%{DATA:[apache2][access][agent]}\")?",
"%{IPORHOST:[apache2][access][remote_ip]} - %{DATA:[apache2][access][user_name]} \\[%{HTTPDATE:[apache2][access][time]}\\] \"-\" %{NUMBER:[apache2][access][response_code]} -" ] }
remove_field => "message"
}
Can anyone tell me what is happening here?.
Also, where can I find the complete grok patterns used by the %{COMBINEDAPACHELOGS} ?
Coinology
(Joshua Jones)
June 14, 2020, 12:01am
2
See the httpd patterns here:
HTTPDUSER %{EMAILADDRESS}|%{USER}
HTTPDERROR_DATE %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}
# Log formats
HTTPD_COMMONLOG %{IPORHOST:clientip} %{HTTPDUSER:ident} %{HTTPDUSER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)
HTTPD_COMBINEDLOG %{HTTPD_COMMONLOG} %{QS:referrer} %{QS:agent}
# Error logs
HTTPD20_ERRORLOG \[%{HTTPDERROR_DATE:timestamp}\] \[%{LOGLEVEL:loglevel}\] (?:\[client %{IPORHOST:clientip}\] ){0,1}%{GREEDYDATA:message}
HTTPD24_ERRORLOG \[%{HTTPDERROR_DATE:timestamp}\] \[%{WORD:module}:%{LOGLEVEL:loglevel}\] \[pid %{POSINT:pid}(:tid %{NUMBER:tid})?\]( \(%{POSINT:proxy_errorcode}\)%{DATA:proxy_message}:)?( \[client %{IPORHOST:clientip}:%{POSINT:clientport}\])?( %{DATA:errorcode}:)? %{GREEDYDATA:message}
HTTPD_ERRORLOG %{HTTPD20_ERRORLOG}|%{HTTPD24_ERRORLOG}
# Deprecated
COMMONAPACHELOG %{HTTPD_COMMONLOG}
COMBINEDAPACHELOG %{HTTPD_COMBINEDLOG}
Does that answer your question?
Hi @Coinology
I tried the above raw pattern to match the following lines of logs, but it was not successfull
45.203.10.210 - - [13/Jan/2020:05:30:38 +0000] \"POST /wp-cron.php?doing_wp_cron=149332.6687 HTTP/1.1\" 200 - \"http://technopart.com/wp-cron.php?doing_wp_cron=149332.6687\" \"WordPress/4.7.2; http://technopart.com\"
I modified the raw pattern a bit in order to get a match but was not successful.
The pattern I tried using was
%{IPORHOST:clientip} %{DATA:ident} %{DATA:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)
Screenshot for reference:
When I parse this with Logstash using the %{COMBINEDAPACHELOG} pattern, it is successfull though.
Can you please help me with this?
Coinology
(Joshua Jones)
June 14, 2020, 1:55pm
4
@curiousmind I'm no grok expert so I'm sorry if the answer is obvbious, but if %{COMBINEDAPACHELOG}
is working for you, why do you want to use the raw pattern?
Coinology
(Joshua Jones)
June 14, 2020, 2:41pm
5
@curiousmind I believe the issue is twofold: the backslashes in your log + your agent section. I've adjusted the raw pattern as below and it seems to be working, but I'm not sure whether this is the best way to handle it or not. Give it a try and tell me what you think.
%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] \\"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) \\"%{NOTSPACE:referrer}\\" \\"%{GREEDYDATA:agent}\\"
I'm still perplexed as to why you would be getting matches with %{COMBINEDAPACHELOGS}
but grokparsefailures with the raw version of the pattern.
system
(system)
Closed
July 12, 2020, 2:42pm
6
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.