Where can I find the grok pattern for the COMBINEDAPACHELOG in logstash?

When I try parsing the apache access logs with the following grok rule, it comes out with 0 grokparsefailures
match => { "message" => ["%{COMBINEDAPACHELOG}"] }

But when I try doing the same with the following grok pattern, which was found in this Elastic Documentation, I end up with more than 30% errors.
The pattern used from the above link is below:

 grok {
        match => { "message" => ["%{IPORHOST:[apache2][access][remote_ip]} - %{DATA:[apache2][access][user_name]} \[%{HTTPDATE:[apache2][access][time]}\] \"%{WORD:[apache2][access][method]} %{DATA:[apache2][access][url]} HTTP/%{NUMBER:[apache2][access][http_version]}\" %{NUMBER:[apache2][access][response_code]} %{NUMBER:[apache2][access][body_sent][bytes]}( \"%{DATA:[apache2][access][referrer]}\")?( \"%{DATA:[apache2][access][agent]}\")?",
          "%{IPORHOST:[apache2][access][remote_ip]} - %{DATA:[apache2][access][user_name]} \\[%{HTTPDATE:[apache2][access][time]}\\] \"-\" %{NUMBER:[apache2][access][response_code]} -" ] }
        remove_field => "message"
      }

Can anyone tell me what is happening here?.
Also, where can I find the complete grok patterns used by the %{COMBINEDAPACHELOGS} ?

See the httpd patterns here:

Does that answer your question?

Hi @Coinology
I tried the above raw pattern to match the following lines of logs, but it was not successfull

45.203.10.210 - - [13/Jan/2020:05:30:38 +0000] \"POST /wp-cron.php?doing_wp_cron=149332.6687 HTTP/1.1\" 200 - \"http://technopart.com/wp-cron.php?doing_wp_cron=149332.6687\" \"WordPress/4.7.2; http://technopart.com\"

I modified the raw pattern a bit in order to get a match but was not successful.
The pattern I tried using was

%{IPORHOST:clientip} %{DATA:ident} %{DATA:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) 

Screenshot for reference:

When I parse this with Logstash using the %{COMBINEDAPACHELOG} pattern, it is successfull though.

Can you please help me with this?

@curiousmind I'm no grok expert so I'm sorry if the answer is obvbious, but if %{COMBINEDAPACHELOG} is working for you, why do you want to use the raw pattern?

@curiousmind I believe the issue is twofold: the backslashes in your log + your agent section. I've adjusted the raw pattern as below and it seems to be working, but I'm not sure whether this is the best way to handle it or not. Give it a try and tell me what you think.

%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] \\"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) \\"%{NOTSPACE:referrer}\\" \\"%{GREEDYDATA:agent}\\"

I'm still perplexed as to why you would be getting matches with %{COMBINEDAPACHELOGS} but grokparsefailures with the raw version of the pattern.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.