Now, if I initiate logstash instance and then I copy-paste a 50k-log file in the "_in" directory (or let a process generate it), I would expect to see all the 50k events/lines in the "test.log" output file. Instead, what I get is a varying-length output (i.e. I could be missing 2 or 3 or 10 or 20 or even none events).
I wonder if this has to do with the pipeline's throttling. So far I have tried the following without any luck:
Running on both Windows and Linux environments
Trying logstash versions 1.5.2, 1.5.5 and 2.0.0
Increasing heap size to 1g
Instead of writing output to file, sending them over the network with lumberjack
Please see below the structure of an exemplary input file.
Unfortunately, the reported issue is not with files or new lines that were not discovered, but I gave it a try anyway, without any luck: for instance I generated a new file, logstash identified it, but still some lines where (randomly) missing at the output, without any (obvious) reasons when I enabled debug.
Hopefully, this can be reproduced easily:
Lauch logstash with the aforementioned configuration
Execute the following command to generate input for logstash: for i in {1..50000}; do echo "This, is, a, csv, and, this, is, line, no, $i"; done > _in/testme.log
Wait until logstash picks up the file (I think this is configurable by the stat interval option)
Notice by counting the out put (e.g. wc -l _out/test.log) the result is not 50000 lines (e.g. it could be 49994 or any other randomly changing count on each run).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.