Hi there
hope you're doing well.
I have a question about the details about how multiline codec processes the input file. based on the file input docs, the file is chunked and lines are being read from the chunks. say we have a large input file (about 5Gb size) in read mode, and there is a pattern in multiline codec to merge lines into one event. my question is that if the codec reads through a chunk and still has not reached the max_bytes or max_lines thresholds, but in the end of current chunk the pattern is not still met, what is the behavior of the multiline codec? does it continue to the next chunk looking for the pattern or it terminates the merged lines and start over the next chunk?
Actually I want to know that is it necessary to alter the default chunk_size in file input if the input file is very large in size and also the pattern for multiline codec may need to contains 1000 or more lines to be merged into single event?
My input is the following and I'm not sure if I really need to consider a big number for file_chunk_size option based on the descriptions above.
input {
file {
mode => "read"
path => [...]
file_chunk_size => 1024000
codec => multiline {
pattern => 'WARC-Type: request'
negate => true
what => previous
auto_flush_interval => 10
max_lines => 100000
charset => "UTF-8"
}
file_completed_action => "log_and_delete"
file_completed_log_path => ...
}
}