Grok parse error on custom apache log file

dlb1001 · September 21, 2015, 1:29pm

Hello,

I'm trying to set up a filter for a custom apache log. The log entry has two different types of lines, as below:

www.bbb.aaa.mil 33.44.112.28 - - [13/Sep/2015:04:02:37 -0400] "HEAD /www/default.htm HTTP/1.1" 302 - "http://www.fullerton.edu/ord/resources/federal-agencies-list.asp" "Mozilla/4.0 (compatiable; MSIE 7.0; Windows NT 5.1) HiScan"

and the other type of line is like this, which has two ips following the hostname:

www.bbb.aaa.mil  11.22.33.44,33.44.112.28 - - [13/Sep/2015:04:02:37 -0400] "HEAD /www/default.htm HTTP/1.1" 302 - "http://www.fullerton.edu/ord/resources/thelist.asp" "Mozilla/4.0 (compatiable; MSIE 7.0; Windows NT 5.1) HiScan"

I used this grok debugger to help me try and figure out how to parse the line. It came back with this as the combo.

%{URIHOST} %{IP}, %{COMBINEDAPACHELOG}

So to troubleshoot of i've been running logstash with stdin and this conf file:

input { stdin { } }

filter {
  grok {
      match => { "message" => "%{URIHOST} %{IP}, %{COMBINEDAPACHELOG}" }
        } 
          date {
              match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
        }
}

output {
      elasticsearch { host => localhost }
      stdout { codec => rubydebug }
}

But as a result I still get a grok parse error: This is what it actually returns

{
       "message" => "www.bbb.aaa.mil 33.44.112.28 - - [13/Sep/2015:04:02:37 -0400] \"HEAD /www/default.htm HTTP/1.1\" 302 - \"http://www.fullerton.edu/ord/resources/federal-agencies-list.asp\" \"Mozilla/4.0 (compatiable; MSIE 7.0; Windows NT 5.1) HiScan\"",
      "@version" => "1",
    "@timestamp" => "2015-09-21T13:15:11.896Z",
          "host" => "happy.cnn.abc.nz",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}

Anyone have any ideas on how to remedy the parse error?

Thank you

Logstash Version: 1.5.4

Dave

magnusbaeck · September 21, 2015, 2:37pm

Your pattern only works if there are two IP addresses present, but the input line that fails only has one. To solve this you can e.g. specify two patterns and grok will try them in order and use the first that matches.

This following works and captures the virtualhost and both client IP addresses:

filter {
  grok {
    match => {
      "message" => [
        "%{URIHOST:vhost} +%{COMBINEDAPACHELOG}",
        "%{URIHOST:vhost} +%{IP:clientip},%{COMBINEDAPACHELOG}"
      ]
    }
  }
}

dlb1001 · September 21, 2015, 3:00pm

Magnus,

Thank you for responding. I've been trying to get a handle on the grok parse format. With this in mind instead of using the built in grok pattern %{COMBINEDAPACHELOG} I'm trying to create my own, just to verify my understanding. When I created my own patterns, I still get grok parse errors in the config test I'm running. Could you look at my patterns and tell me where I'm going wrong?

Here they are:

DoubleIP
"%{HOST:ServerName} %{IP}, %{IP} %{HTTPDATE} %{QUOTEDSTRING:RequestFirstLine} %{POSINT:HTTPStatus} %{URI:Referrer} %{QUOTEDSTRING:UserAgent}"'

SingleIP
"%{HOST:ServerName} %{IP} %{HTTPDATE} %{QUOTEDSTRING:RequestFirstLine} %{POSINT:HTTPStatus} %{URI:Referrer} %{QUOTEDSTRING:UserAgent}"

magnusbaeck · September 21, 2015, 4:33pm

Pay attention to the whitespace. Your example log entries have no space after the comma that separates the two IP addresses, but there are two spaces after the hostname (i.e. before the IP addresses).

dlb1001 · September 21, 2015, 5:22pm

I feel like I'm close to understanding the syntax. But I don't get why you have the + sign in front of the %{combinedapachelog} and in front of the +%{IP:clientip},%{COMBINEDAPACHELOG} . Do you need the plus sign to handle when there are spaces in the expression?

magnusbaeck · September 21, 2015, 5:26pm

In regular expressions plus signs mean "one or more occurrences of the preceding token". In this case the preceding token is a space, so it's a way to be more lax about the number of spaces and handle both one and two (and ten) occurrences of them.

Topic		Replies	Views
Issue in parsing a custom apache log file with grok plugin Logstash	7	669	May 10, 2019
Grok parse failures for ssl_request_log (but none in debugger) Logstash	2	1150	November 6, 2017
GROK parsing totally different log lines Logstash	5	524	November 14, 2019
Apache Error Logs Not Parsing Correctly Logstash	1	286	October 10, 2022
Logstash - Grok parse failure Logstash	2	2301	July 31, 2018

Grok parse error on custom apache log file

Related topics