Delete files using the "registry" file

Hi,

I have a use case in which, multiple large files are pushed on a server. These are read by Filebeat and sent to Logstash.

I have written a script to read the registry file and get the filenames that have been acknowledged by Fielbeat.

My question is, if it is safe to delete these files? When is a file actually updated in the registry? Are the files updated after they are completely processed by beats or are they updated when beats starts to process them?

Please suggest.

Filebeat first saves a state of the file when it encounters it first. In this case the offset is 0, because there is not any messages which have been acknowledged. After the output returns the ACK, the number of ACKed messages in bytes are saved as the offset of the file. If the file is completely processed, meaning the last offset in the registry points to the end of the file, and no more lines are coming in, it should be safe to delete the file.

1 Like

Thank you for the clarification. Here is a python code for deleting files based on registry file information:

#!/bin/python
import json
import os


def watcher(filepath="/var/lib/filebeat/registry"):
    """Delete the files specified in the registry file."""
    f = open(filepath, 'r')
    a = f.readlines()
    f.close()
    b = json.loads(a[0])
    for x in b:
        try:
            if int(x["offset"]) == os.stat(x["source"]).st_size:
                os.remove(x["source"])
            else:
                print("File not processed, offset: ", x["source"])
        except Exception as e:
            print(e)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.