Bug in grok pattern for german/austrian month

(Administrator Valida) #1


we are using Logstash 5.0.0 and a grok pattern to parse a timestamp. That's a sample log line:

18 Jän 2016 16:14:40,408 INFO LOGINFOSAMPLE

With the grok pattern %{MONTH} we are unable to parse the timestamp since the grok pattern is not parsing the umlaut for Januar. For example with March we have no problem since the pattern checks for umlauts.

See also https://dict.leo.org/englisch-deutsch/january.html


(Administrator Valida) #2

The original pattern:
MONTH \b(?:Jan(?:uary|uar)?|Feb(?:ruary|ruar)?|M(?:a|ä)?r(?:ch|z)?|Apr(?:il)?|Ma(?:y|i)?|Jun(?:e|i)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|O(?:c|k)?t(?:ober)?|Nov(?:ember)?|De(?:c|z)(?:ember)?)\b

The working pattern:
MONTH \b(?:J(?:a|ä)?n(?:uary|uar)?|Feb(?:ruary|ruar)?|M(?:a|ä)?r(?:ch|z)?|Apr(?:il)?|Ma(?:y|i)?|Jun(?:e|i)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|O(?:c|k)?t(?:ober)?|Nov(?:ember)?|De(?:c|z)(?:ember)?)\b


(system) #3

