Logstash - Schedule files

Hello,

I need read new files inside one directory. I have a script that every 10 minutes send news files for this directory.

I created this setup in my pipeline:

When the first files were insert in the directory everything working ok, but when I stopped pipeline to incluse file_completed_action and mode options the pipeline didn't work again.

Only working when I use /usr/share/logstash/bin/logstash -f.

I need setup sincedb_path ? How I setup it ?

Hello,

What are the changes you did?

Please share how your pipeline was and what you changed.

Also, avoid sharing screenshots of configurations, share them as text using the preformatted text option, the </> button, it makes easier to read and copy to replicate if needed.

Logstash per default stores the sincedb file inside {path.data}/plugins/input/file/, if you installed using rpm or deb, it will be /var/lib/logstash/plugins/input/file/.

Once logstash read a file, it won't read it again unless you remove the sincedb file, if you are still testing your pipeline and want to reprocess some files, it helps to set sincedb_path to /dev/null.

sincedb_path => "/dev/null"

The file input tracks files based on the inode. If you delete a file after reading it then if a new file is created it may use the same inode. The sincedb entry will cause the new file to be ignored. There has been an open issue to get this fixed for several years.

@leandrojmp

I can't copy pipeline text, because I access an environment which didn't a chance to copy and paste, only capture the screen. Sorry...

I need insert logs from files, that they are insert one directory. After that I need erase the file that just insert, because I don't have many spaces in this machine.

The name of files are different between them.

I think about use filebeat to read these files and insert in the same pipeline using beat input. What do you think?

@Badger

I think understood you. If I delete the file, I need create the new file with different name to logstash start read. If I use the same name, logstash won't read the file, because logstash will understand that is the same file. Correct ?

The important thing is the inode number, which identifies the file. The name is just a string that is mapped to an inode number by a directory entry.

$ touch foo; ls -li foo; rm foo
13448262 -rw-rw-r-- 1 ec2-user ec2-user 0 Dec 27 19:48 foo
$ touch bar; ls -li bar; rm bar
13448262 -rw-rw-r-- 1 ec2-user ec2-user 0 Dec 27 19:48 bar

I create a file and it uses inode number 13448262. I then delete it and create a new file with a different name. It uses the next available inode number, which is the recently freed 13448262. Logstash would think they were the same file.

The inode reuse that badger mentioned could be an issue in some cases, but with an interval of 10 minutes between the file creation I think that the chances of this happening is pretty low.

But in your use case I think that you could leave the sincedb_path as /dev/null as you are dealing with new files only.

If you choose to use Filebeat keep in mind that it cannot delete files, so you would need an external tool to delete the files, I would use Logstash for this case and configure it to delete the files after processing.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.