Filebeat not closing deleted files redux

Apologies, I know this issue has been asked several times, but still not sure exactly how to configure my way out of it.

We are running 1.3.0 on Linux. Our application rotates log files, and at times of major error, can log to such a degree that filebeat cannot keep up. During these times we have space issues as file beat has file handles to deleted files.

He have our filebeat.yml as follows:

  prospectors:
    # Each - is a prospector. Below are the prospector specific configurations
    -
      paths:
        - /foo/log/bar/server.log
      input_type: log
      ignore_older: 15m
      document_type: jbosslog
      scan_frequency: 10s

The log rolling rolls to names such as server.log.1 (up to 40). I think what happens is that we get so backed up that server.log.40 is not finished sending to logstash when the next log rotation happens and the file that was server.log.40 is deleted for good.

So we end up with a pile up like this:

# lsof |grep -i deleted
filebeat   3367      root    1r      REG              253,2 52428928     131082 /usr/local/nuance/log/id/server.log.40 (deleted)
filebeat   3367      root    2r      REG              253,2 52428849     132141 /usr/local/nuance/log/id/server.log.40 (deleted)
filebeat   3367      root    4r      REG              253,2 52428848     131120 /usr/local/nuance/log/id/server.log.40 (deleted)
filebeat   3367      root    5r      REG              253,2 52428846     131155 /usr/local/nuance/log/id/server.log.40 (deleted)
filebeat   3367      root    6r      REG              253,2 52428897     132116 /usr/local/nuance/log/id/server.log.40 (deleted)
filebeat   3367      root    7r      REG              253,2 52428963     131134 /usr/local/nuance/log/id/server.log.40 (deleted)

It is my understanding that "close_older" is not coming into play in this case since that clock does not apply if we are not caught up on a file. Is this correct?

I am not using "force_close_files" since I would like to avoid losing data as much as possible, and this condition (app logging excessively) is rare. Normally, filebeat might get a little behind but not that much, so we expect to catch up on a file before it is finally deleted. force_close_files will close the file handle every time the file name changes, and since these files are being rotated they will not be written to again. So if we are not caught up on the file, whatever we have not yet sent to logstash will be lost, right?

I would like to have the flexibility to configure this somehow in between, so we don't force close on every file rotation, but when we get really far behind like this and the file is deleted (as opposed to just being rotated) we close, even if we are not caught up for a file. Is there anyway I can accomplish this?

One option I was thinking of is writing a script to detect when filebeat has a handle to a deleted file (maybe just one called server.log.40?) and restart filebeat.

As we realised there are some limitations with the close_* options in filebeat 1.x we introduce new config options with the 5.0 release that should help in your case.

force_close_files is not split up in two options: close_renamed and close_removed. I think the second one is the one you are looking for. And other interesting option in your case could also be close_timeout: https://www.elastic.co/guide/en/beats/filebeat/5.0/configuration-filebeat-options.html#_close_timeout

Important to note for all these options is, that you only will loose data in case the file is removed when a harvester is not currently open on the file. In case a file is only rotated but the scan will find it later again, it will just continue at the old position.

Your description of close_older (now close_inactive) is correct.

Thanks for you reply. [quote="m0thra, post:1, topic:60946"]
force_close_files will close the file handle every time the file name changes, and since these files are being rotated they will not be written to again. So if we are not caught up on the file (when it is rotated), whatever we have not yet sent to logstash will be lost, right?
[/quote]

Is this understanding of what will happen if force_close_files is set to true and a file we are not caught up on is renamed correct?

Any estimate of when 5.0 will be GA?

For force_close_files: In case the file is found again after rotation by the prospector, it will be opened again and the reading will finished. But for this you must make sure that it is also in the paths you defined. In case the file is removed before it is found again after scan_frequency, the log lines which were not read yet, will be lost.

5.0 beta1 is around the corner and will probably be shipped in the next 1-2 weeks. GA should be in the next 1-2 months.

So if I understand you correct, one change would be to change the path to

Then with force_close_files set to true, the file handle will be closed when the file rolls. But will be found by a new prospector since the new file name, "server.log.1" matches the path regex. But in this case will the new prospector start from the beginning of the file, resending log lines that have already been sent to logstash? Or does filebeat know that is has read part of the file.

We will be upgrading to 5.0 as soon as it is GA, so thanks for the timing info on that.

Filebeat will know that it is the same file and will start at the old position. Files are tracked based on inode and not file name.

Ok, great, I had assumed that when a file handle is force closed the reference to the inode is removed (how are they ever cleaned up?). So given that, are there any downsides this this set up?

  paths:
    - /foo/log/bar/server.log*
  input_type: log
  ignore_older: 15m
  document_type: jbosslog
  scan_frequency: 10s
  force_close_files: true

Force_close_files will eliminate the space-consuming references to files that are deleted, and the new path regex will allow files to be read when they are renamed (starting where they left off). The only data loss will occur then when a file is deleted for good (as opposed to being renamed) and filebeat is not full caught up on that file. BTW -- thanks for all your help on this.

The main downside (could also be an upside) is that in case filebeat gets behind on reading because of reason X, it could happen that files are deleted before reading finished (as you described).

About the cleanup: It differs from in 1.x and 5.x In 1.x the cleanup happens when a new file with a different inode but the same path shows up and it will overwrite the old state. The previous state should be at this time already in a new state as the file was rotated to a new file name. This caused several issues in the past which is one of the reason that the state handling was rewritten in the 5.x release.

In 5.x states are not path dependent. The config options clean_removed and clean_inactive were added. clean_removed is enabled by default, so all states for files which were removed will also be cleaned.

If you want to go with the 5.0 release, just today the beta1 was published: https://www.elastic.co/blog/elastic-stack-release-5-0-0-beta1

I think we will start moving to 5.0. Is filebeat 5.0 compatible with these ELK versions (see below), or do we need to upgrade all at once.

elasticsearch-2.4.0
kibana-4.5.1-1
logstash-2.3.3

Filebeat is compatible with Elasticsearch 2.x.

This topic was automatically closed after 21 days. New replies are no longer allowed.