How Logstash works for 10,000 lines of log file

The file input tracks state in the sincedb. On UNIX it tracks the progress it has made reading each file using the device numbers and inode number. On Windows it uses something similar.

If you ask the file input to tail foo.log then when data is added to the file it will read just the new data. If foo.log is rotated to foo.log.1 and a new foo.log is created then the file input will see it as a new file and start over at the beginning.

Using the inode number to track the file can break. For example, on some filesystems, if you delete foo.log and immediately create a new one it will re-use the same inode number. logstash will then think it has already read bytes from the file and ignore data added to the new foo.log until it is longer than the previous foo.log.

This is fundamentally a very hard and expensive problem, and the file input uses a cheap shortcut that is usually right. Sometimes it is wrong.

2 Likes