Date Filter Plugin Not Working With Elasticsearch

Hello Everyone!

I am having trouble getting the Date Filter Plugin properly working with my Elasticsearch. Below is my Logstash configuration file:

input {
beats {
port => "5043" }
}
filter {
grok {
match => ["message", '%{GREEDYDATA:IPs} - [ %{GREEDYDATA:auth_message} ] [%{HTTPDATE:logtime}] "%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{NUMBER:anothernumber} %{NUMBER:someothernumber}'] }
date {
match => ["logtime", "dd/MMM/yyyy:HH:mm:ss Z"]
}
kv {
source => "auth_message"
value_split => ":"
}
}
output {
elasticsearch { hosts => ["localhost:9200"] }
stdout { codec => rubydebug }
}

When I set the output to stdout, Logtash correctly parses through my log and updates the @timestamp to my logtime field (like below).

{
"request" => "/XXXXX-XXX/vX/logout",
"anothernumber" => "X",
"offset" => XXXXXXX,
"input_type" => "log",
"verb" => "XXX",
"source" => "/XXX/XXX/XXX/awsLogs.log",
"someothernumber" => "X",
"message" => "XXX.XXX.XXX.XXX, XXX.XX.XXXX.XX, XXX.X.XXX.XX - [ auth : no-auth | correlation-id : XXX-XXXX | remote-addr : XX.XXX.XX.XXX | request_method : XXX | request_resource : XXXX | service_name : XXXX-XX ] [02/Jun/2017:16:42:14 +0000] "XXX /XXX-XXX/vX/logout?redirectUrl=https%3A%2F%2Fapp.hubspot.com%2Flogin&loggedout=true HTTP/1.0" 307 0 0 0",
"type" => "log",
"IPs" => "XX.XXX.XX.XXX, XXX.XX.XXX.XX, XXX.X.XXX.XX",
"tags" => [
[0] "beats_input_codec_plain_applied"
],
"@timestamp" => 2017-06-02T16:42:14.000Z,
...
"host" => "ip-XX-XXX-XXX-XX",
"httpversion" => "X.X",
"logtime" => "02/Jun/2017:16:42:14 +0000"
}

However, when I make the output go to my Elasticsearch (uncomment the line in my configuration file), my log message loses a particular field, the key-value pairs are not parsed, and Kibana displays that there has been a "_grokparsefailure".

Please let me know if anyone has any resolution to this issue!

If we can't see the text that fails the grok filter I don't see how we could help. The fact that the grok filter succeeds for a different string isn't very relevant.

Please format your configuration as preformatted text so backslashes etc aren't mangled.

1 Like

Hey magnusbaeck,

Sorry for the confusion! Here is my Logstash configuration as preformatted text:
input { beats { port => "5043" } } filter { grok { match => ["message", '%{GREEDYDATA:IPs} - \[ %{GREEDYDATA:auth_message} \] \[%{HTTPDATE:logtime}\] \"%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{NUMBER:anothernumber} %{NUMBER:someothernumber}'] } date { match => ["logtime", "dd/MMM/yyyy:HH:mm:ss Z"] } kv { source => "auth_message" value_split => ":" } } output { stdout { codec => rubydebug } }

Additionally, here is a sample log line that I am working with:
1.2.3.4, 5.6.7.8 - [ auth : non-auth | auth-level : app | correlation-id : 12312-bd12-41238-8asfd-13rdsfaf | hub-id : 101 | login-id : bshrestha@email.com-596 | remote-addr : 1.2.3.4 | request_method : bypassLogin | request_resource : BypassResource | service_name : Login-mobile | user : true | user-id : 12444212556 ] [02/Jun/2017:16:07:39 +0000] "POST /login/v2/ HTTP/1.0" 401 122 12 12

If I set the output of my logstash configuration file as:
stdout { codec => rubydebug }
Then, the key-value pairs I want are correctly parsed and the Logstash timestamp is updated to the actual log's time. Below is a shortened down version of what I see in stdout:

{
"request" => "/XXXXX-XXX/vX/logout",
"anothernumber" => "X",
"offset" => XXXXXXX,
"input_type" => "log",
"verb" => "XXX",
"source" => "/XXX/XXX/XXX/awsLogs.log",
"someothernumber" => "X",
"message" => "XXX.XXX.XXX.XXX, XXX.XX.XXXX.XX, XXX.X.XXX.XX - [ auth : no-auth | correlation-id : XXX-XXXX | remote-addr : XX.XXX.XX.XXX | request_method : XXX | request_resource : XXXX | service_name : XXXX-XX ] [02/Jun/2017:16:42:14 +0000] "XXX /XXX-XXX/vX/logout?redirectUrl=https%3A%2F%2Fapp.hubspot.com%2Flogin&loggedout=true HTTP/1.0" 307 0 0 0",
"type" => "log",
"IPs" => "XX.XXX.XX.XXX, XXX.XX.XXX.XX, XXX.X.XXX.XX",
"tags" => [
[0] "beats_input_codec_plain_applied"
],
"@timestamp" => 2017-06-02T16:42:14.000Z,
...
"host" => "ip-XX-XXX-XXX-XX",
"httpversion" => "X.X",
"logtime" => "02/Jun/2017:16:42:14 +0000"
}

However, when I make the output of my Logtash to:
elasticsearch { hosts => ["localhost:9200"] }
Then, my key-value pairs are not correctly parsed, a part of my log message is emitted and my timestamp does not update the Logstash's @timestamp (see the screenshot of my Kibana as a reference).

Please let me know if you need any further clarification! I appreciate the quick response!

I really don't think the choice of output plugin has anything to do with this. What does an example event from ES, where the grok parsing failed, look like?

1 Like

There is a Grok parsing failure for every log with the format I provided. However, if I remove the date filter plugin in my Logstash configuration file, like this:
input { beats { port => "5043" } } filter { grok { match => ["message", '%{GREEDYDATA:IPs} - \[ %{GREEDYDATA:auth_message} \] \[%{HTTPDATE:logtime}\] \"%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{NUMBER:anothernumber} %{NUMBER:someothernumber}'] } kv { source => "auth_message" value_split => ":" } } output { stdout { codec => rubydebug } }
Elasticsearch will correctly parse through the log line and have no Grok parsing failure, like so:

However, when I add the date filter in my Logstash configuration file, like this:
input { beats { port => "5043" } } filter { grok { match => ["message", '%{GREEDYDATA:IPs} - \[ %{GREEDYDATA:auth_message} \] \[%{HTTPDATE:logtime}\] \"%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{NUMBER:anothernumber} %{NUMBER:someothernumber}'] } date { match => ["logtime", "dd/MMM/yyyy:HH:mm:ss Z"] } kv { source => "auth_message" value_split => ":" } } output { stdout { codec => rubydebug } }
I get the Grok parsing failure for the same logs like so:

It's not the same log entry. The file offset differs. Look, I find it extremely hard to believe the the presence of the date filter is what's making this difference. The failing example clearly doesn't have any auth information right after the IP addresses which certainly would explain the grok failure. If you look at offset 5087196 in the log file, what do you see?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.