I am trying to read the files from a particular path using logstash file input plugin and i am using file_completed_path and since_db path to track the list of files which are processed. It was working fine and logstash is continuously updating both the file_completed_path, since_db with the files and reading only new files. But suddenly from today it started reading the old files and creating a duplicate entries for the same files in file_completed_path, since_db path.
There are no modifications to file_completed_path, since_db paths and using them from starting. Could you please suggest what may be the route cause for this issue and why logstash is reading the files again even they are already read and updated in the since_db path.
Please find the below sample entries in the file_completed_log_path folder for my pipeline. The same file has read twice. One read happened on Nov 12 and another one Today(Nov 18).
The device number has changed from 40 to 41, which means logstash thinks it is a different disk. That can happen with NFS mounts when they are re-mounted (e.g. after a reboot). There's nothing logstash can do about it.
You may be able to fix it for the client/server (e.g. here) but that will be OS specific.
Thank you for the update. Now the NFS team is planning to migrate the FS to Azure NFS after they did the changes again will our logstash reprocess all the files. Is there any way to handle this situation without rerunning the old files.
We are not creating any lvcreate to create volumes we are just mounting it in our server using fstab.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.