Hi there,
I am having some difficultiy with a custom log format, when sending it into logstash via the beats input. If I send the same log line into logstash (with the same filters as the beats input) via stdin, the event is parsed perfectly.
Both pipelines get the event into Elasticsearch, but the filebeat pipeline fails to parse the log properly, adding a _grokparsefailure
tag to the event; whereas the stdin does a perfect job of it.
This is the log line in question:
2016-06-11T16:28:33+10:00 60.241.43.104 200 GET /test.gif?v=1&_v=j44&a=1565401534&t=pageview&_s=1&dl=https%3A%2F%2Fwww.example.com%2F&ul=en-gb&de=UTF-8&dt=Title&sd=24-bit&sr=1280x720&vp=1263x614&je=0&fl=21.0%20r0&_u=QCCAAEABI~&jid=1451351253&cid=855412598.1465456935&tid=UA-28513095-1&_r=1&z=1190582191 HTTP/1.1 https://www.example.com/ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36"
This is my filebeat.yml config:
filebeat:
prospectors:
-
paths:
- /var/log/nginx/custom.log
input_type: log
document_type: cx-gacm-log
output:
logstash:
hosts: ["127.0.0.1:31312"]
console:
pretty: true
shipper:
logging:
files:
path: /var/log/gacm-filebeat
rotateeverybytes: 10485760 # = 10MB
level: debug
Here is the filebeat logstash pipeline config:
beats {
port => 31312
}
}
filter {
if [type] == "cx-gacm-log" {
grok {
match => {
"message" => "%{TIMESTAMP_ISO8601:timestamp} %{IPORHOST:remote_ip} %{NUMBER:status_code} %{WORD:request_action} %{URIPATH:collector_path}(?:%{URIPARAM:request_params})? HTTP/%{NUMBER:http_version} %{URIPROTO:referrer_proto}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST:referrer_host})?(?:%{URIPATHPARAM})? \"%{DATA:agent}\""
}
}
date {
match => [ "timestamp", "ISO8601" ]
}
geoip {
source => "remote_ip"
target => "geoip"
}
useragent {
source => "agent"
target => "user_agent"
}
# key value split for the parameters
kv {
field_split => "&?"
source => "request_params"
}
mutate {
#remove the fields we don't want
remove_field => ["message", "request_params", "agent"]
}
urldecode {
all_fields => true
}
}
}
output {
elasticsearch {
hosts => "localhost"
index => "%{[@metadata][type]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}
}
I use a different config file for the stdin pipeline just to make things a little easier, but it is essentially the same.
Here is the stdin pipeline config:
input {
stdin {
type => "cx-gacm-stdin"
}
}
filter {
if [type] == "cx-gacm-stdin" {
grok {
match => {
"message" => "%{TIMESTAMP_ISO8601:timestamp} %{IPORHOST:remote_ip} %{NUMBER:status_code} %{WORD:request_action} %{URIPATH:collector_path}(?:%{URIPARAM:request_params})? HTTP/%{NUMBER:http_version} %{URIPROTO:referrer_proto}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST:referrer_host})?(?:%{URIPATHPARAM})? \"%{DATA:agent}\""
}
}
date {
match => [ "timestamp", "ISO8601" ]
}
geoip {
source => "remote_ip"
target => "geoip"
}
useragent {
source => "agent"
target => "user_agent"
}
# key value split for the parameters
kv {
field_split => "&?"
source => "request_params"
}
mutate {
#remove the fields we don't want
remove_field => ["message", "request_params", "agent"]
}
urldecode {
all_fields => true
}
}
}
output {
elasticsearch {
hosts => "localhost"
index => "%{[@metadata][type]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}
if [type] == "cx-gacm-stdin" {
stdout {
codec => rubydebug
}
}
}
Grokdebug also validates and parses the log line fine, so I am thinking it is something to do with the filebeat, but there is nothing special in the setup, I didn't think.
Just to fill out the question, I am using all of the latest release versions of all of the stack (not 5.x), it is running on an ubuntu 14.04 box with plenty of resources. The log is a custom log format from nginx, and it is just logging a get on a single gif file.