Logstash: Log records fail to load in large log file but work when run separately

Filebeat is feeding data into logstash and passing this onto ES. I'm monitoring the load of around 5 million records and notice that I'm getting a lot of _grokparsefailure.

when I drill into the logstash output I can see the following:

15:11:09.999 [[main]>worker1] WARN logstash.filters.date - Failed parsing date from field {:field=>"time", :value=>"2016-10-02_02:59:40.997", :exception=>"Invalid format: "2016-10-02_02:59:40.997" is malformed at "_02:59:40.997"", :config_parsers=>"yyyy-MM-dd_HH:mm:ss.SSS,yyyy-MM-dd HH:mm:ss.SSS", :config_locale=>"default=en_AU"}

The record that caused this to fail is:

2016-10-02_02:59:40.997 [transaction-1] INFO companyA.engine.TransactionHandler - Transaction Completed : SOCKETID=62437,TXNREFERENCE=,CLIENTID=10000031,RESPONSECODE=-1,RESPONSETEXT=TRANSACTION NOT FOUND,DURATION=0,TRANSACTIONTYPE=STATUS[STATUS],INTERFACE=CREDITCARD

I tested this in GrockDebugger and it worked fine.
I loaded it separately in it's own log file and it loaded fine.

The logstash pipeline conf file is:

input {

stdin {

}

beats {
port => "5043"
}
}
filter {
grok {
patterns_dir => ["./patterns"]
match => { "message" => "%{TRANTRACK_DATE:time} %{NOTSPACE} %{NOTSPACE} %{NOTSPACE} - %{NOTSPACE} Completed :\s\w*=(%{WORD:socketval})?,\w*=(%{WORD:txnref})?,\w*=(%{WORD:clientid})?,\w*=(%{NUMBER:respcode})?,\w*=(%{MULTI_WORD:resptext})?,\w*=(%{WORD:duration})?,\w*=(%{TRAN_WITH_SUBTYPE:txntype})?,\w*=(%{WORD:interface})?" }
overwrite => ["message"]
}
if ([message] =~ "Transaction Start") {
drop {}
}
date {
match => ["time","yyyy-MM-dd_HH:mm:ss.SSS","yyyy-MM-dd HH:mm:ss.SSS"]
#timezone => "Australia/Sydney"
}
}

output {
#stdout {codec => json}
elasticsearch {
hosts => [ "x.x.x.x:9200" ]
index => "companyAtrantrack-companyB-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}
}

Each record that failed as part of the main load has worked in a smaller file. I'm confused.

The pattern seems ok and it works for me with just using 2016-10-02_02:59:40.997.
Might be worth testing this again with the original file, maybe there are (invisible) control chars in there or something.
The issue seems to be with this part "02:59:40.997"

Ok, just saw your other post and I think you already answered your question there.

After commenting in the timezone => "Australia/Sydney" (used UTC+1 before) I'm also getting this error and it's because there is no 02:00-03:00 on 2016-10-02 (daylight savings switch over).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.