Your regex looks somewhat complicated to me... Well, regexes do always look complicated...
No need to be to strict in the regex. Don't try to parse the date, but capture the 'shape.
Plus I'm not sure the ^ operator used is really active for all patterns.
In my multiline example I've been using variables to split the patterns: https://play.golang.org/p/o83xIytnBJ
One can simulate variables in beats configuration files via (I didn't test this, hope it works ;)):
filebeat.prospectors:
- type: log
...
multiline.pattern: '^${patterns.timestamp}'
patterns:
timestamp: '((${patterns.timestamp1})|(${patterns.timestamp2}))'
# capture dates of type '2017-01-01 01:02:03.456'
timestamp1: '\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}'
# captures dates of type 'Jan 01, 2017 1:23:45 AM' or 'Jan 01, 2017 11:23:45 PM'
timestamp2: '[JFMAMSOND][a-z]{2} \d{2}, \d{4} \d\d?:\d{2}:\d{2} [AP]m'
With things becoming more complicated and devs potentially adding more quirks, you might consider some automated testing support:
- have corpus of logs
- have corresponding json file with actual events
We do this for filebeat modules. Every module has a defined test corpus (e.g. apache access module) we do run as part of our system tests (would be cool if filebeat would have a script to run these kind of tests automatically). The tests use the file output in filebeat, to load and compare the events with the expected output.
Having automated tests, gives you a chance to capture breakages in case of you having to adapt/refine the regular expression.
Not sure I do understand the line splitting issue. But we've seen similar issues with events being split in multiline with log4j RollingFileAppender. Issues have been caused by the Appender buffering content without (or too long) flush timeouts (buffer is only flushed once full). The multiline state machine in filebeat has a flush timeout, sending events if it's waiting for too long. Reason is, the state machine can not tell if file/event is over when no more content is appended to the file. You can disable the timeout by setting multiline.timeout: 0. If the multiline event is the last event in the log file, the event will be send once filebeat finally closes the log file.