I daily move yesterday's log file into "/home1/.logstash/json_input/deco.json.YYYY.MM.DD"
It's size is approximately 1GB. and usually take 20~30minutes to process all line of the file.
when after yesterday's log file moved into logstash input directory, it is well tailed and out processed to elasticsearch.
one difficult thing I have with this situation,
If logstash or elasticsearch got break while process some line of the large file,
logstash have no sincedb position data, because it didn't reached EOF yet, sincedb position never wrote.
also this configuration have no effect.
sincedb_write_interval
in that case I don't know from which line should I re process the log data that didn't processed.
so, no way to failover.
I think your observation is correct. The underlying filewatch library will read a file in a tight loop until it hits EOF and only afterwards will it consider updating sincedb. This seems like a bug to me—why not consider updating sincedb inside the read-32kb-and-yield loop? Could you please file an issue for the bug at https://github.com/logstash-plugins/logstash-input-file/issues/new? Here's the relevant code:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.