We have 40 instances sending logs parallely to centralized logstash via filebeat. We have tried multiline codec with logstash and filebeat both as well. Both seems to be working.
But we see few issues with logstash like filebeat queues files when log rotation happens during high volume duration. (we run logstash with default worker i.e 1).
Is there any recommendation for handling multiline messages should we use filebeat or logstash?
Performing multiline processing as close to the source as possible is generally preferable, so I would recommend doing it in Filebeat. This will allow you to send data either to a message queue or load balance across multiple Logstash instances without having to worry about lines that are supposed to be merged at a later stage get split up.
multiline for filebeat works for start line as well. For application traceback it works. Just a slight difference in syntax from logstash grok. Will post you some examples soon for python traceback.
Inspired by your suggestion on filbert/multiline, I've dug a bit deeper, and found that it can, as you said, merge multiline based on a start-line, if you use it something like this
multiline
- pattern: "start-line-pattern"
- negate: true
- match: after
Only slight thing i haven't worked out, is how to make it match the end of an event.
This is because this multiline configuration will continue waiting for the next "start-line-pattern", at which it will then emit the current event, and start another.. But then the last event will not be emitted, until the given timeout is reached.
Is there anyway to specify a flush pattern, ending the multiline?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.