Hi, I'm new to Logstash[5.5] and I'd love some advice from the community on this problem:
I have a number of Postgres servers on which a monitoring tool is run. This monitoring tool runs a query that generates a multi-line log message multiple times per second, and the log messages aren't useful. However, dropping them is problematic because the messages are of the form:
[33793-1] < 2017-07-16 15:19:24.936 UTC >LOG: execute <unnamed>: SELECT t0.job_id, MIN(t0.last_modified_time) FROM COORD_ACTIONS t0 WHERE (t0.status = $1) GROUP BY t0.job_id HAVING MIN(t0.last_modified_time) < $2
followed by a line for the parameters:
[33793-2] < 2017-07-16 15:19:24.936 UTC >DETAIL: parameters: $1 = 'READY', $2 = '2017-07-16 15:09:24.93'
In this case, 33793 (along with the host name) is a good enough identifier to link the two lines. The first line is [33793-1] and the second is [33793-2].
The first line is easy to find, of course, because it's always the same.
The second is harder because of the embedded timestamp, and it also may represent valid parameters to a query that I want to keep.
The lines don't necessarily occur contiguously.
I could conceivably use the "aggregate" filter plugin, but I'm not all that enthusiastic about having a single filter worker thread as I have quite a few filters and reasonably high volume.
The segments of the message don't necessarily occur contiguously, so using multiline at the source side in my filebeat configuration probably won't be effective.
If possible, I would like to simply periodically emit into elasticsearch counts of these pairs and drop the original lines.
More generally, what approaches are there to finding multiple segments of a message (where multiline doesn't apply) without resorting to "aggregate", and where the number of segments isn't known at the time the first message is received?
I would be grateful to hear how others have solved similar problems.