File input: sincedb general questions

I have some questions about how sincedb and File input works. My current understanding is that File input records where it gets up to in a particular file by writing it to the sincedb file every x seconds or on Logstash shutdown. It adds the inode for the file and the read position in it. When Logstash starts up again it uses this file to find the place it was at previously and continues from there.

My questions are both to help debug a current issue and just for general interest!

  1. When the File input starts does it first look in its specified path to see what files are there then compares it to the sincedb? Or does it use the sincedb entries to identify new files in the path?
  2. If the sincedb is really old because Logstash hasn't been run in a while what happens when an inode listed is now pointing at a different file in the same specified path?
  3. And what about if that new inode file is in a different non-monitored directory elsewhere on the system?
  4. At what point does Logstash remove old entries from sincedb? Is it just every x seconds when it writes new entries to the sincedb? E.g. files which have been deleted while Logstash is running.
  5. Since the File input is read-only and doesn't gain a lock on the file it's reading what happens if the file is removed while it's being read?

Thanks for any answers to any of these :smile:

Caveat emptor: I'm no expert with the filewatch library that Logstash uses, but I have been poking around the source code and done some experimentation.

  1. Apart from an exhaustive search there's no way to find files based on the inode, hence Logstash can't start from the sincedb file. Either way, it's always the files that match the filename pattern(s) supplied by the user that determines what should be monitored.
  2. Then you're going to have a bad time. Logstash will pick up the old sincedb entry and start reading the file from the old position. Well, unless the old offset is greater than the current file size, in which case it'll assume a copy/truncate rotation has taken place and start from the top.
  3. Again, Logstash only cares about sincedb entries if they match files that are supposed to be watched.
  4. A sincedb entry is deleted and possibly recreated when a) a file that existed at the last filename expansion doesn't exist anymore, b) the inode of a monitored file changes, or c) a file is smaller than it used to be.
  5. Logstash will keep reading the file until it discovers that the file (at least with the current inode number) is gone.

Most of the interesting code is in watch.rb and should be fairly easy to read even if you don't know Ruby.

3 Likes

A sincedb entry is deleted and possibly recreated when a) a file that existed at the last filename expansion doesn't exist anymore, b) the inode of a monitored file changes, or c) a file is smaller than it used to be.

This isn't true. AFAICT Logstash never prunes sincedb entries.

Hi Everyone,
I am new one here. I have got a question regarding to 'file input' plugin and sincedb file. Let me try to describe situation:
Traces are coming all the time to the specific log file. Generally Logstash is turned on, but sometimes I would like to turn it off. In this case I would like to skip traces, I do not need to take care of them. When I turn Logstash on again I wish to process only new incoming traces.

So my question is: how to handle logs comming only when Logstash is turned on?
Should I manipulate/update sincedb file? Is it the only proper solution?
Maybe there is certain flag to deal with it in this way?

I really appreciate. Any suggestions are welcomed.

@krzysztof_pl, please start a new thread for your unrelated question.

I understood. I have created new topic. Thank you.

I have found that even after the file from the filepattern provided to logstash has been deleted, sincedb is not updated, and so when a new file comes in, and happens to replace the sincedb of the previously deleted file, logstash will not pickup the new file. I have written a script to deal this. I am not sure if this is supposed to be handled by logstash or not?