Date parsing issue in ELK Logstash with the custom Java timestamp format of logs


(Krishnachandra Menon) #1

Following is the sample logs recieved from java application

2019-04-11 9:08:22:562 Log 1 
2019-04-11 9:08:22:660 Log 2 
2019-04-11 9:08:43:79 Log 3 
2019-04-11 9:08:43:156 Log 4 

From above logs, I'm facing issue with Log 3 where the milliseconds value is only 79, but after parsing in the Logstash, the value is set as 790 ms (Logstash parsing is correct, but java log value is wrong). Actually the timestamp should be 2019-04-11 9:08:43:079 in the log for proper parsing.

Logstash filter is as below:

date {
    match => [ "log_time", "yyyy-MM-dd HH:mm:ss:SSS", "ISO8601" ]
    target => "log_time"
    timezone => "CET"
}

On digging deeper, I found the issue is with Java app logging with this time format, it will be resolved if the format is yyyy-MM-dd HH:mm:ss.SSS . But the logging application uses the format yyyy-MM-dd HH:mm:ss:SSS which causes this issue (Note the difference in format :SSS and .SSS ).

I cannot change the logging java application, So is there any workaround with the Logstash filter to fix this issue.


#2

Insert the missing 0s using mutate+gsub

    mutate { gsub => [ "log_time", "^([0-9-]+ [0-9]+:[0-9]{2}:[0-9]{2}:)([0-9])$", "\100\2",
                       "log_time", "^([0-9-]+ [0-9]+:[0-9]{2}:[0-9]{2}:)([0-9]{2})$", "\10\2" ] }

(Krishnachandra Menon) #3

Thank you.It resolve my issue, can you please elaborate on the solution. i.e. explain what this regular expression do to the millisecond format?


#4

The gsub filter expects an array of triplets as its parameter. The first of the three items is the name of the field to be modified, the second is a pattern to match, and the third is the replacement. In the first triplet the replacement is \100\2. The \1 and \2 are back-references to capture groups in the pattern that is matched. So it says to copy the first match of the pattern, then insert 00, then copy the second match from the pattern. Given that it is trying to change 2019-04-11 9:08:43:7 into 2019-04-11 9:08:43:007 it should be clear what it is trying to do.

So the first pattern has to match 2019-04-11 9:08:43:. I anchor the pattern to the start of the string using ^ as a performance optimization. If, for some reason, log_time does not start with a number then the pattern will fail immediately instead of seeking matches to a sub-string of log_time.

In

([0-9-]+ [0-9]+:[0-9]{2}:[0-9]{2}:)

the parentheses represent a capture group, which is later referenced using \1. Square brackets enclose a character group: in this case numbers and hypen. The + after the group means one or more occurences of members of the group. So [0-9-]+ matches the date. That's followed by a space followed by one or more numbers, then a colon. The {2} means exactly two occurrences of the preceding character group (this can also be a range, so {2,4} means between two and four occurences).

The rest of the pattern ([0-9])$ is just a character group for a single digit followed by the end of the string. If there is only a single digit I insert 00 in front of it. The second triplet matches the case where there are two digits in the millisecond field and there I insert 0 in front of it.


(Krishnachandra Menon) #5

Thanks a lot Badger, it's Cristal clear.