I've recently debugged an issue back to the GROK pattern that logstash uses.


The short version is: this only works when the message starts with that, or if there is a concrete definition of what exactly has to be in front of this pattern. If you have anything like GREEDYDATA in front of it then the millenium and century can end up in the GREEDYDATA pattern/variable.

The reason is the definition of the year:

YEAR (?>\d\d){1,2}

On the surface this looks OK, it looks for 1 or 2 sets of two digits.
Unfortunately with GREEDYDATA in front of it, it will assign '20' as last part of the greedydata and 17 as the year, leading to filenames like 0017 for the year...!!!

I much rather see a YEAR definition like ((\d\d\d\d){1}|(\d\d){1,2})

Especially in ISO8601 I would expect a 4 digit year to take precedence.


ISO 8601 does require four-digit years (https://en.wikipedia.org/wiki/ISO_8601#Years) so technically the current definition is wrong. However, changing this would be problematic for backwards compatibility reasons.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.