I have some log lines that have contextually appended segments. Unfortunately, we're not using a structured logging format, so there's no simple way to parse them out.
My logs have one of these forms:
2019-01-01 00:00:00:000 GMT [pool-12-thread-34] INFO Segment1 Segment2 Segment3 Segment4 com.company.class.name - Actual log message text here
2019-01-01 00:00:00:000 GMT [pool-12-thread-34] INFO Segment1 Segment2 Segment3 com.company.class.name - Actual log message text here
2019-01-01 00:00:00:000 GMT [pool-12-thread-34] INFO Segment1 - Actual log message text here
2019-01-01 00:00:00:000 GMT [pool-12-thread-34] INFO - Actual log message text here
- That is to say, it always starts with a timestamp, pool info, and log level.
- Optionally, there's the text of segment 1.
- Optionally, there's the text of segment 2, but only if segment 1 existed.
- Optionally, there's the text of segment 3, but only if segment 2 existed.
- Finally, there's the message
The grok processor supports if
, but not in the version of ES available to me (6.4.1). Is there a better way to do this, in a performant way?
I could do this one one giant regex, but IDK how to say "if no match, don't populate the field at all". Additionally, I fear that an ill-designed regex would test for segment 2 after seeing that segment 1 didn't exist (which is wasteful).
Do you guys have any suggestion for how to implement something like this?
Currently, I'm using a regex along these lines:
^%{TimeStamp}%{SPACE}%{ThreadInfo}%{SPACE}(%{A:a}%{space}(%{B:b}%{space}(%{C:c}%{space})?)?)? - %{text: message}$
It works... but it's pretty nasty, and I'm not sure about how much more efficient it could get if implemented better.