I'm trying to import data from a csv into an ES index using logstash. In the CSV I have seperate date and time columns that I'm combining and then parsing it with the date filter plugin so it can use it as the @timestamp. There are 150500 records and all of pass and are correctly matched except for 1 record. Reviewing this record there is nothing obviously abnormal about it that would necessarily cause such an issue so I'm at a loss. I've tried deleting the index and rerunning logstash multiple times and each time the same record fails. The record is tagged with the _dateparsefailure tag and its @timestamp is the only one containing the upload time instead of the parsed.
I'm new to logstash so there's probably a better way to do this but I have a field called date that contains a "Date" like so "MM/dd/yyyy 12:00:00 AM" (yes every record is 12am) and a "Time" field like so "HH:mm". I pass the following filters:
I guess that the date filter fails because of the suffix " AM,02:30". Although I have not tried it I think the correct format definition would be(see here for details): MM/dd/yyyy HH:mm a,ZZ.
You can provide multiple formats for the date filter so LogStash tries both and chooses the correct one:
So because I truncate the Date field to 10 bytes the 12:00:00 AM is removed before DateTime is even created. The DateTime field comes out to this "DateTime" => "03/13/2016 02:30"
Oh gotcha that makes sense. I'm assuming it was just a data entry error then. There are plenty of records during 01 and 03 on the 13th and this was the only record for 02. Is there a better way of combining the Date and Time fields just in the date filter so I don't have to pass through the truncate and mutate filters?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.