Multiline codec behavior on large input files

Mahdi_Moazami · March 20, 2024, 5:43pm

Hi there
hope you're doing well.
I have a question about the details about how multiline codec processes the input file. based on the file input docs, the file is chunked and lines are being read from the chunks. say we have a large input file (about 5Gb size) in read mode, and there is a pattern in multiline codec to merge lines into one event. my question is that if the codec reads through a chunk and still has not reached the max_bytes or max_lines thresholds, but in the end of current chunk the pattern is not still met, what is the behavior of the multiline codec? does it continue to the next chunk looking for the pattern or it terminates the merged lines and start over the next chunk?
Actually I want to know that is it necessary to alter the default chunk_size in file input if the input file is very large in size and also the pattern for multiline codec may need to contains 1000 or more lines to be merged into single event?

My input is the following and I'm not sure if I really need to consider a big number for file_chunk_size option based on the descriptions above.

input {
    file {
        mode => "read"
        path => [...]
        file_chunk_size => 1024000
        codec => multiline {
            pattern => 'WARC-Type: request'
            negate => true
            what => previous
            auto_flush_interval => 10
            max_lines => 100000
            charset => "UTF-8"
        }
        file_completed_action => "log_and_delete"
        file_completed_log_path => ...
    }
}

Badger · March 20, 2024, 9:04pm

chunk processing is way upstream of the codec. The file input uses the filewatch library to read the file. filewatch reads a chunk and splits it into lines. If a chunk doesn't end with a line delimiter it continues reading the line from the next chunk. Once it has a complete line it passes it to the file input. The file input then passes the line to the multiline codec.

I cannot see any way for the chunk size to affect the multiline codec.

Mahdi_Moazami · March 20, 2024, 9:31pm

Thanks Badger. Now I see the workflow and the independency of codec with file read process

system · April 17, 2024, 9:31pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multiline codec with file input Logstash	5	393	June 21, 2018
Logstash multiline codec functional question Logstash	8	1509	July 11, 2017
My multiline input codec is broken Logstash	2	418	May 10, 2017
Hello! I want to use multiline codec patterns in my use case.. Here I want to read the multiple lines as one event Logstash	2	228	September 13, 2019
Multiline codec along with other type of codec Logstash	2	250	November 9, 2020

Multiline codec behavior on large input files

Related topics