I have i feeling I already know the answer to this (which is probably "can't be done") but worth a shot.
The application logs I'm recording have the unfortunate incident of having no year specified in the timestamp for the log events, so it always assumes it's current year. There are exceptions: in the log event message for application startup it will specify the year. So i would like to pull the year from that event and then attach it as the year for every subsequent event in that log file until it hits the next startup event.
Is this possible? Will this need to be done by a third-party program/script prior to being pushed to Logstash?
No, there's no support for this. It's actually not so easy to do since a generic solution can't assume that the whole log is read in one swoop. What if Logstash is interrupted halfway through? It would have to save the additional metadata about the year from the start of the file.
What you could do is write a small custom filter that's used for this kind of log entry and that reads the log file until it locates the start event and then saves the year in a separate field. If the startup event is at the top of the file you might even get away with a ruby filter that opens the file and reads the first line or so. That'd be quite slow but maybe it's good enough.
As Magnus mentions there is no generic solution to this problem, I also encounter this problem with the file input and I found a workaround because the year was in the filename, and this filename is stored in each event in the path field, so grokking this field saved me. I hope this can help you in thinking on a solution that suits your usecase.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.