All my app logs start with a timestamp having this format: yyyy-MM-dd HH:mm:SS Z
Log messages can have a different format and I have some grok pattern to parse some of them, but what I want is to always extract this timestamp and use it instead of @timestamp which is autogenerated by logstash automatically.
Is there s way to split the message this part of the message and add it as the timestamp field ?
Use a grok filter to extract the timestamp into its own field, then use a date filter to parse that date into @timestamp. This is a standard pattern so almost any Logstash configuration example can be a source of inspiration.
but suppose I have multiple grok patterns in my filter like:
filter {
match => { "message" => ["\[%{TIMESTAMP_ISO8601:timestamp}\] }
match => { "message" => ["\[%{TIMESTAMP_ISO8601:timestamp}\] ERROR %{JAVACLASS} - ABC}
match => { "message" => ["\[%{TIMESTAMP_ISO8601:timestamp}\] ["\[%{DATA:alg_name}:%{DATA:alg_type}::Algorithm\] }
date {
locale => en
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
target => "@timestamp"
}
}
This means all my messages will match only the first pattern and add the timestamp field, but the rest?
If I change the order and put the first pattern as last, then I have to parse many messages again previous patterns, so it is resource consuming for nothing!
When you have multiple expressions you must, of course, place the most specific expression first. From an efficiency point of view you should instead order expressions in the order of likelihood of matching but then you'll sacrifice correctness.
If you're concerned about efficiency you should start by getting rid of unnecessary uses of DATA and GREEDYDATA patterns. Any use of those patterns except at the end of the expression should be scrutinized.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.