I have a requirement to perform an action after successful harvesting by FileBeats (e.g. move the file to archive dir) and was wondering what the best practices/options are for this ?
There is nothing I can see in FB that directly supports this and I'm guessing its a fairly common use case so I am curious about other peoples' solutions.
Indeed Filebeat doesn't support this directly. Would it work to just periodically run those tasks for files older than a few hours, to make (almost) sure that Filebeat had enough time to read them?
You could read the registry file and compare the offset with the file size, which would be an indication that Filebeat has completely read the file, but that seems like overkill to me.
well, thing is filebeat does not know when/if a file is ever complete. A process can write to a log file maybe once every 24 hours... That is, filebeat is assuming all the time a file potentially being modified.
The admin/operator/service e.g. using logrotate or configuring the service knows best when a file is completed. For this having a script (e.g. cron or integrated in logrotate) comparing file size and filebeat offset in registry matching sounds like best solution to date to me. The file will not be removed from registry by filebeat, if file is still available and considered by filebeat. The content in the registry file is JSON. You can easily parse and process the file.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.