Hello,
So I was facing an issue with logstash not parsing the last line of my xml log file. I found that the answer would be to add auto_flush_interval => 1 to my multiline codec. I would like my pipeline to be ordered so I set pipeline.workers : 1 and by default pipeline.ordered: true. This works just fine when it's just one file. My problem is that I have to parse multiple files at once, and the order gets messed up with the auto_flush_interval enabled.
The order in which the files are read doesn't matter, what matters is that the lines of each file are read in order. I would like for logstash to complete the whole file before passing to the next one. What actually happens is, it continues to parse the next file before reading the last line of the previous one. Is there a condition to put or some way to assert that it does not pass to the next file unless it reads the last line?
In the documentation it talks about "complete" reading as opposed to "stripped" reading where it interleaves chunks of files. The default options make it read 4611686018427387903 chunks of 32 KB from a file before it starts processing the next one. Effectively, it reads to EOF. I cannot think why a multiline codec would change that.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.