sandy6
(sandy)
January 31, 2019, 4:57am
1
Hi,
i am parsing apache logs from 2 apache servers. Both have different log formats. i want to parse logs and extract all the fields. Some fields are available in server-1 logs, which are not present in server-2 logs and vice-versa. I want to write pipleline, in such a way, that if any of the filed is not available then it has to skip that field and parse the remaining part. Here are some of the sample logs
127.0.0.1 - - [11/Sep/2017:20:20:30 +0530] "GET /" 403 202 0 91
127.0.0.1 - - [25/Jan/2019:10:22:12 +0530] "GET /index.html HTTP/1.1" 200 25692
127.0.0.1 - - [25/Jan/2019:10:22:16 +0530] "-" 408 -
127.0.0.1 - - [28/Jul/2016:15:17:23 +0530] "GET /hello.png HTTP/1.1" 304 -
127.0.0.1 - - [09/Nov/2018:10:58:28 +0530] "GET / HTTP/1.1" 403 4897 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0"
127.0.0.1 - - [09/Nov/2018:10:58:28 +0530] "GET /css/bootstrap.min.css HTTP/1.1" 200 19341 "http://127.0.0.1/ " "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0"
Here is my pipeline
filter {
if [fields][log] == "apache" {
dissect {
mapping => {
"message" => '%{clientip} %{ident} %{auth} [%{timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{response} %{bytes} %{restOfLine}' }
}
grok {
match => {
"restOfLine" => [
"^(?:%{NUMBER:tosinsec}|-) (?:%{NUMBER:tosinmicrosec}|-)",
"^%{QS:referer} %{QS:agent}"
]
}
}
date {
match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
locale => en
}
geoip {
source => "clientip"
}
useragent {
source => "agent"
target => "useragent"
}
}
}
Please suggest, how to parse such logs in one pipeline.
Badger
January 31, 2019, 2:50pm
2
What don't you like about the output of that pipeline? This is what I get...
{
"auth" => "-",
"clientip" => "127.0.0.1",
"bytes" => "19341",
"request" => "/css/bootstrap.min.css",
"message" => "127.0.0.1 - - [09/Nov/2018:10:58:28 +0530] \"GET /css/bootstrap.min.css HTTP/1.1\" 200 19341 \"http://127.0.0.1/\" \"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0\"",
"restOfLine" => "\"http://127.0.0.1/\" \"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0\"",
"response" => "200",
"@version" => "1",
"ident" => "-",
"timestamp" => "09/Nov/2018:10:58:28 +0530",
"@timestamp" => 2018-11-09T05:28:28.000Z,
"tags" => [
[0] "_geoip_lookup_failure"
],
"verb" => "GET",
"agent" => "\"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0\"",
"useragent" => {
"device" => "Other",
"os" => "Linux",
"minor" => "0",
"build" => "",
"name" => "Firefox",
"os_name" => "Linux",
"major" => "60"
},
"referer" => "\"http://127.0.0.1/\"",
"httpversion" => "1.1",
"geoip" => {}
}
A geoip lookup failure for 127.0.0.1 (or anything in 192.168/16) is expected.
sandy6
(sandy)
February 1, 2019, 5:26am
3
Hi Badger,
"127.0.0.1" is sample IP. Any IP can come there (even public IP).
Can you try to parse the logs with response code other than 200. Please try to parse the following logs
127.0.0.1 - - [11/Sep/2017:20:20:30 +0530] "GET /" 403 202 0 91
127.0.0.1 - - [25/Jan/2019:10:22:12 +0530] "GET /index.html HTTP/1.1" 200 25692
127.0.0.1 - - [25/Jan/2019:10:22:16 +0530] "-" 408 -
127.0.0.1 - - [28/Jul/2016:15:17:23 +0530] "GET /hello.png HTTP/1.1" 304 -
I am getting "grok failure" error while parsing the above logs. Because some of the fields are missing/no value.
Thanks.
Badger
February 1, 2019, 1:47pm
4
You should also be gettiing _dissectfailure. You will have to reduce the dissect to
dissect { mapping => { "message" => '%{clientip} %{ident} %{auth} [%{timestamp}] "%{request}" %{restOfLine}' } }
and increase the number of grok patterns, and also run grok against the request field.
system
(system)
Closed
March 1, 2019, 1:47pm
5
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.