Want to parse fields in apache logs, where some fields are optional (may or may not be part of logs)

Hi,
i am parsing apache logs from 2 apache servers. Both have different log formats. i want to parse logs and extract all the fields. Some fields are available in server-1 logs, which are not present in server-2 logs and vice-versa. I want to write pipleline, in such a way, that if any of the filed is not available then it has to skip that field and parse the remaining part. Here are some of the sample logs

127.0.0.1 - - [11/Sep/2017:20:20:30 +0530] "GET /" 403 202 0 91
127.0.0.1 - - [25/Jan/2019:10:22:12 +0530] "GET /index.html HTTP/1.1" 200 25692
127.0.0.1 - - [25/Jan/2019:10:22:16 +0530] "-" 408 -
127.0.0.1 - - [28/Jul/2016:15:17:23 +0530] "GET /hello.png HTTP/1.1" 304 -
127.0.0.1 - - [09/Nov/2018:10:58:28 +0530] "GET / HTTP/1.1" 403 4897 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0"
127.0.0.1 - - [09/Nov/2018:10:58:28 +0530] "GET /css/bootstrap.min.css HTTP/1.1" 200 19341 "http://127.0.0.1/" "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0"

Here is my pipeline

filter {
if [fields][log] == "apache" {
dissect {
mapping => {
"message" => '%{clientip} %{ident} %{auth} [%{timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{response} %{bytes} %{restOfLine}' }
}
grok {
match => {
"restOfLine" => [
"^(?:%{NUMBER:tosinsec}|-) (?:%{NUMBER:tosinmicrosec}|-)",
"^%{QS:referer} %{QS:agent}"
]
}
}
date {
match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
locale => en
}
geoip {
source => "clientip"
}
useragent {
source => "agent"
target => "useragent"
}
}
}

Please suggest, how to parse such logs in one pipeline.

What don't you like about the output of that pipeline? This is what I get...

{
       "auth" => "-",
   "clientip" => "127.0.0.1",
      "bytes" => "19341",
    "request" => "/css/bootstrap.min.css",
    "message" => "127.0.0.1 - - [09/Nov/2018:10:58:28 +0530] \"GET /css/bootstrap.min.css HTTP/1.1\" 200 19341 \"http://127.0.0.1/\" \"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0\"",
 "restOfLine" => "\"http://127.0.0.1/\" \"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0\"",
   "response" => "200",
   "@version" => "1",
      "ident" => "-",
  "timestamp" => "09/Nov/2018:10:58:28 +0530",
 "@timestamp" => 2018-11-09T05:28:28.000Z,
       "tags" => [
    [0] "_geoip_lookup_failure"
],
       "verb" => "GET",
      "agent" => "\"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0\"",
  "useragent" => {
     "device" => "Other",
         "os" => "Linux",
      "minor" => "0",
      "build" => "",
       "name" => "Firefox",
    "os_name" => "Linux",
      "major" => "60"
},
    "referer" => "\"http://127.0.0.1/\"",
"httpversion" => "1.1",
      "geoip" => {}
}

A geoip lookup failure for 127.0.0.1 (or anything in 192.168/16) is expected.

Hi Badger,
"127.0.0.1" is sample IP. Any IP can come there (even public IP).

Can you try to parse the logs with response code other than 200. Please try to parse the following logs

127.0.0.1 - - [11/Sep/2017:20:20:30 +0530] "GET /" 403 202 0 91
127.0.0.1 - - [25/Jan/2019:10:22:12 +0530] "GET /index.html HTTP/1.1" 200 25692
127.0.0.1 - - [25/Jan/2019:10:22:16 +0530] "-" 408 -
127.0.0.1 - - [28/Jul/2016:15:17:23 +0530] "GET /hello.png HTTP/1.1" 304 -

I am getting "grok failure" error while parsing the above logs. Because some of the fields are missing/no value.

Thanks.

You should also be gettiing _dissectfailure. You will have to reduce the dissect to

dissect { mapping => { "message" => '%{clientip} %{ident} %{auth} [%{timestamp}] "%{request}" %{restOfLine}' } }

and increase the number of grok patterns, and also run grok against the request field.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.