Suppose we have an input file input.dat and we are reading it in Logstash in tail mode. We set close_older to 10 minutes.
My questions are:
Is the close_older setting deprecated? The docs say it is "retained for backward compatibility", implying that there is a new way of doing the same thing.
If we don't write to input.dat for 10 minutes, will LS close the file handle to input.dat? At that time, can an external process delete the file (or archive it somewhere)?
Please note: we need to use tail mode, not read mode.
No, it is not deprecated. Before "read" mode was implemented if you wanted to read a set of files, you would use a file input in tail mode with start => beginning. close_older was then used to free up file handles as it finished reading the files. You can still do that (because it has backward compatibility).
On UNIX anyone who has permission to write to a directory can remove the directory entry (which is what a lot of folk mean by deleting the file). If it is the only directory entry and there are no file handles that have the file open then the file is deleted. So an external process can delete the file logstash is reading. logstash will continue to read the file, when close_older kicks in and logstash's file handle is closed then the space occupied by the file will be freed if there are no other file handles or directory entries pointing to it. See here for more colour.
Right. I knew that. We want LS to release the file handle as soon as close_older kicks in so that the file space can be cleaned up. We just weren't sure whether LS would release the file handle immediately or wait until it reached max_open_files or whether it just set an internal flag to ignore the file.
Every time it processes watched files (I think this is every five seconds) the code will check if any files are closeable. If closing old files is enabled then it just checks if the last time data was read from the file was more than close_older seconds ago. If it was it closes the file handle.
Wait! There is a problem. The docs say that the default value of close_older is 1 hour, but we have seen files that were unreleased and they were older than 1 hour. Are the docs wrong?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.