After days of trial and error I've found inconsistent RegEx pattern matches with filebeat that seem to work when tested elsewhere: golang and RegEx101
Filebeat configuration:
filebeat:
prospectors:
- input_type: log
paths:
- /var/opt/SFTP7_PC/logs/session_logs/session_SFTP_*.log
encoding: plain
include_lines:
- 'S T O R*'
- 'R E T R*'
fields_under_root: false
document_type: log
scan_frequency: 10s
harvester_buffer_size: 16384
max_bytes: 10485760
multiline:
pattern: '\b\|(\d{2})\/(\d{2})\/(\d{4})\s+(\d{2}):(\d{2}):(\d{2}).(\d{3})\|\b'
negate: false
match: after
max_lines: 3
tail_files: false`
This configuration works but only seems to match and concatenate the lines containing "226-Upload" See Here
All log lines begin with SESSION - is there an obvious reason why a simple pattern fails to match any lines?
When using ^SESSION or \bSESSION\b or even removing literal pipes from the original: \b(\d{2})\/(\d{2})\/(\d{4})\s+(\d{2}):(\d{2}):(\d{2}).(\d{3})\b
Filebeat debug always backs off:
2017-01-12T11:09:41Z INFO Harvester started for file: /var/opt/SFTP7_PC/logs/session_logs/session_SFTP_1069386.log
2017-01-12T11:09:41Z DBG End of file reached: /var/opt/SFTP7_PC/logs/session_logs/session_SFTP_1069386.log; Backoff now.
2017-01-12T11:09:42Z DBG End of file reached: /var/opt/SFTP7_PC/logs/session_logs/session_SFTP_1069386.log; Backoff now.
2017-01-12T11:09:43Z DBG End of file reached: /var/opt/SFTP7_PC/logs/session_logs/session_SFTP_1069386.log; Backoff now.
2017-01-12T11:09:44Z DBG End of file reached: /var/opt/SFTP7_PC/logs/session_logs/session_SFTP_1069386.log; Backoff now.
2017-01-12T11:09:45Z DBG End of file reached: /var/opt/SFTP7_PC/logs/session_logs/session_SFTP_1069386.log; Backoff now.
2017-01-12T11:09:46Z DBG Flushing spooler because of timeout. Events flushed: 1
2017-01-12T11:09:46Z DBG No events to publish`
TBH, from your post I don't fully understand the actual problem you're facing.
Which filebeat version are you using?
Do you have some sample logs - content - for testing with this playground?
multiline support has a timeout in case of an event still being buffered, but no file update for N seconds. The timeout flushes the current buffer. Have you tried to disable the timeout? e.g. if your upload/download exceeds the multiline timeout, it's not correctly combined.
From your link I can not really tell which lines you exactly want to merge + it's helpful to see a more complete log. I wonder if you really want multiline (multiline only merges successive line) or wether you need some form of joining multiple lines by some key. Problem is: what if multiple concurrent upload/downlods are active? Will the logs for the 2 sessions be intermixed? In the later case you might have a 'bigger' problem as there doesn't really seem to be a consisten session-id being logged for all messages.
What's the issue with backoff? By default the reader in filebeat uses multiple processing layers. 1) read file 2) split lines 3) multiline ... The backoff comes from first layer, as the reader has reached end of file (can not read any more content, as OS signals the reader reached the end of the file). In case of having no content, the reader will wait and retry reading content from your log (See options backoff and max_backoff).
Thanks for the reply, finally have the best pattern match and it's a lot simpler than first thought, to be clearer all I needed was to match a single line and then concatenate the following 2 into a single string for Logstash groks. The mistake I made was to find a match for ALL lines!
To avoid errors as I did, might I mention multiline regex patterns should ONLY match the initial line and not the consecutive lines that require appending
I had to negate the next 2 entries that matched the multiline pattern, this was simplified to: '\s+R\*$' which successfully matches all lines ending S T O R* and R E T R*. So the only thing changed in the config was the simplified pattern and negate set to true.
As these are SFTP logs; in regards to concurrent up/downloads (well spotted by the way) each SFTP session creates its own log file avoiding intermixing messages.
As larger files will inevitably take longer to transfer I increased values: backoff: '5m' and max_backoff: '10m'
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.