I have a log file from apache ssl_request_log
which is a slightly different format from every other one in that it swaps the ordering of various fields. So I copied the macro from %{COMBINEDAPACHELOG}
and made a few changes to get this:
match => { "message" => "\[%{HTTPDATE:timestamp}\] %{IPORHOST:clientip} %{NOTSPACE:tls_version} %{NOTSPACE:crypto_algorithm} \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response}" }
I've pasted the regex (without the enclosing quotes), and everything is parsed out as expected in the online debugger just fine. Yet logstash fails to parse the same lines and I am left with a _grokparsefailure
and my own no_apache_match
tag on every entry from this source in elastic.
Here is an example line:
[06/Oct/2017:14:54:04 -0600] 172.17.0.47 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 "GET /sadf/sdf.jpg?134 HTTP/1.1" 200
I'm fairly confident in my pipeline confiugration, which is set up as a chain of failovers (there is probably some redundancy in here but I don't see any issues):
filter {
if "apache" in [tags] {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
remove_field => "message"
add_tag => ["combinedapachelog"]
}
if "combinedapachelog" not in [tags] {
grok {
match => { "message" => "%{COMMONAPACHELOG}" }
remove_field => "message"
add_tag => ["commonapachelog"]
}
if "commonapachelog" not in [tags] {
grok {
match => { "message" => "%{HTTPD_ERRORLOG}" }
remove_field => "message"
add_tag => ["httpd_errorlog"]
}
if "httpd_errorlog" not in [tags] {
grok {
match => { "message" => "\[%{HTTPDATE:timestamp}\] %{IPORHOST:clientip} %{NOTSPACE:tls_version} %{NOTSPACE:crypto_algorithm} \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response}" }
remove_field => "message"
add_tag => ["ssl_request_log"]
tag_on_failure => ["no_apache_match"]
}
}
}
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
locale => en
remove_field => ["timestamp"]
}
if "no_apache_match" not in [tags] {
geoip {
source => "clientip"
target => "geoip"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
}
mutate {
convert => {
"[geoip][coordinates]" => "float"
}
}
useragent {
source => "agent"
target => "useragent"
}
}
else if "syslog" in [tags] {
grok {
match => { "message" => "^<%{POSINT:syslog_pri}>(?:%{TIMESTAMP_ISO8601:syslog_timestamp}|%{SYSLOGTIMESTAMP:syslog_timestamp}) %{SYSLOGHOST:syslog_hostname} %{DATA:program}(?:\[%{NONNEGINT:pid}\]?)?:? %{GREEDYDATA:syslog_message}" }
}
syslog_pri {
syslog_pri_field_name => "syslog_pri"
}
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
else if "fiesta" in [tags] {
grok {
match => { "message" => "%{GREEDYDATA:data}" }
}
mutate {
add_field => { "[location][lat]" => "%{lat}" }
add_field => { "[location][lon]" => "%{lon}" }
}
mutate {
convert => {
"[location][lat]" => "float"
"[location][lon]" => "float"
}
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
}
This is my only filter from a single config file. All of my log entries which are from the ss_request_log
source, end up being tagged as no_apache_match
, meaning they fail on every grok match attempt in my apache chain.
I'm not really sure how to go about debugging this further since the online debugger works.
UPDATE:
So something is very strange. I set up a test like so:
# contents in conf.test
input {
stdin { }
}
filter {
grok {
match => { "message" => "\[%{HTTPDATE:timestamp}\] %{IPORHOST:clientip} %{NOTSPACE:tls_version} %{NOTSPACE:crypto_algorithm} \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response}" }
}
}
output {
stdout {
codec => rubydebug
}
}
# contents in test.dat
[06/Oct/2017:14:54:04 -0600] 172.17.0.47 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 "GET /sadf/sdf.jpg?134 HTTP/1.1" 200
# actual start command
/usr/share/logstash/bin/logstash --debug -f /etc/logstash/conf.d/conf.test < /etc/logstash/conf.d/test.dat
and it works! I get the document printed out to stdout with all fields parsed correctly. This is strange.