FileNotFound not catched by file_rotator


(Simonecottini) #1

Hi all,

I had configured Filebeat to write it's output to file on filesystem. During Filebeat execution if you delete the output file ( or obviously one of it's parent folders ) Filebeat keep running and doesn't throw any error.

I tried it on various operating systems like Windows Server ( 2003, 2008, 2012 ), Windows 8 and Linux ( CentOS, RHEL and Mint ) and this bug occurred in all of them

I rebuild the suite ( version 1.2 ) adding a check on beats/libbeat/logp/file_rotator.go checking if file exists and returning an error if it doesn't. Furthermore, for my needs, i replace the lines from 87 to 91 on beats/libbeat/outputs/fileout/file.go with the WTF log line in order to panic the goroutine that otherwise can lose data.

I report you my piece of code ( only of file_rotator ) aware that it's not the best solution.

72 if !rotator.FileExists(0) {
73 return errors.New("File %s does not exists anymore", filepath.Join(rotator.Path, rotator.Name))
74 }

Cordially,
Simone Cottini


(Steffen Siering) #2

This is kinda normal behavior enforced by the OS (you will see loads of other tools behaving similar).

When deleting a still opened file, the file is 'unlinked' (can not be found anymore) only, but is still using space on disk (you can still read/write to the file). As the OS is not reporting any error (by design), filebeat is not catching any error. The file is finally delete from disk (disk space is freed), once the process closes it's file handle. This is why usage of logrotate often mandates sending a signal (or restarting service) on file rotation, as you can rename/delete a file still being used (file handle is still valid).

Why do you want to delete the current file?


(Simonecottini) #3

First of all thank you for your reply.

To answer you i don't need to delete file. The point is that my company is working on a log management solution and to simplify the infrastructure we use a Windows network share as a repository of filebeat's generated file. The problem is that if that share go down we lose lots of logs lines. So we prefer to shutdown filebeat's process instead of keep it running.


(Steffen Siering) #4

remote windows network share + simplify I normally not here in one sentence :slight_smile:

Why not have filebeat to elasticsearch/logstash or even kafka? Filebeat can deal with network errors. By using a remote drive for output, filebeat has no chance of reacting to network failures as this is in the domain of the OS (which might buffer data or not).

Any post-processing on the files written by filebeat?

On linux the mount options hard/soft exists ($ man mount.cif):

hard
The program accessing a file on the cifs mounted file system will hang when the server crashes.

soft
(default) The program accessing a file on the cifs mounted file system will not hang when the server crashes and will return errors to the user application.

This makes wonder what happens if the windows remote really goes down. This scenario is very different from just deleting the file.

You can try having filebeat run with INFO log level or on command line for manual testing run filebeat with -e -v and shut down your windows server. Is filebeat still updating the registry file? Is it reporting errors? Is it blocked (maybe until windows reboot)?


(Simonecottini) #5

For network security policies, we don't want to open non-default ports for each server in a network having hundreds of nodes.

I'd just do that test and if i remeber well filebeat flushes the spooler and updates the registry file at first loop after network crash. After that actions Filebeat does anything more.


(Steffen Siering) #6

What's a default port? If we define beats default port to be 5044, this enough? Can you use some kind of port-fowarding e.g. via stunnel + SOCKS5 proxy (which is supported by filebeat).

Wonder if the last spooler flush is backed by OS buffering IO, or if filebeat did really manage to write data to file (network crash happened before). Stop filebeat, restart windows machine and inspect logs + compare to original log source and offsets in registry file. As file IO is handled by OS, we're limited by OS/driver capabilities in this case.

Why not mount the source-machines file system into the logstash-one? Network shares are a pain in general, but dealing with blocked reads might be better handled by OS (due to no data in buffer) then blocked writes due to remote being unavailable.


(system) #7