Unless configured otherwise the date filter stores the parsed result in the @timestamp field and your syslog_timestamp field it left as-is. Why not just remove that field?
Our application code (can't be changed) requires that field syslog_timestamp. I think I see that %{SYSLOGTIMESTAMP:syslog_timestamp} is not matching in the grok because that pre-defined value does not include milliseconds.
The error message indicates that the syslog_timestamp field indeed contains "May 4 19:22:07.321" so I don't think the grok filter is the problem. What does an example event produced by Logstash look like? Use a stdout { codec => rubydebug } output.
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss.SSS", "MMM dd HH:mm:ss.SSS" ]
}
I think the problem is that built in grok match %{SYSLOGTIMESTAMP does not have milliseconds. How can I define a custom grok SYSLOGTIMESTAMP pattern that includes millisecondss?
There is nothing wrong in either grok (syslog_timestamp has the value you want) or date (so does @timestamp). The problem is when it tries to index that syslog_timestamp field. elasticsearch wants to convert syslog_timestamp to a date, but cannot figure out what the format is. As Magnus said, you have already extracted @timestamp from it, so why not remove the field using mutate+remove_field
Ok developers said it is ok to remove syslog_timestamp after all. So the only question is received_at the syslog timestamp and @timestamp is the timestamp that Logstash processed the message?
You mean BEFORE, right? timestamp is set to the timestamp embedded in the syslog message. received_at is set to the value of @timestamp before you parse syslog_timestamp, which will be when logstash received it, which will always be after syslog sent it.
First challenge... that grok you are using is way too basic to catch the multitude of strange things vendors send via syslog. Unless you have a lot of control over the source data, you will need something more complex.
Second challenge... related to the above, vendors will send a lot of different formats of timestamps. Your data filter will have to account for these. Keep in mind that order matter. The first match wins, so the patterns should be ordered most specific to least specific to avoid matching the wrong pattern.
For a good example of the basic syslog processing that we do, take a look at this...
We have gigabytes of sample data from 100's of different devices and apps, and the methods used in that repo do a pretty good job of handling the couple dozen variations that we have seen.
You may be able to adapt some of the concepts to your needs.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.