Filebeat 5.0 beta 1 is not closing deleted files

chris.mcdermott · October 11, 2016, 9:19pm

Hello,

I am having an issue with beta1. It is holding open a lot of deleted files even though the files have not been updated for a time much greater than close_older. I am not setting close_after so it should be defaulting to 5 minutes.

<stapp@g2t4454c:filebeat5> lsof | grep 25938 | grep log/local | grep '(deleted)' | head filebeat 25938 stapp 41r REG 253,8 52432329 1443043 /usr/local/st/mount1/sa-stats/log/local/acceptfile/acceptfile-20161010103824.log (deleted) filebeat 25938 stapp 43r REG 253,8 52428947 1444258 /usr/local/st/mount1/sa-stats/log/local/fast/fast-20161009080815.log (deleted) filebeat 25938 stapp 57r REG 253,8 20971721 1443090 /usr/local/st/mount1/sa-stats/log/local/axedalistener/axedalistener-20161008231614.log (deleted) filebeat 25938 stapp 79r REG 253,8 20971560 1443032 /usr/local/st/mount1/sa-stats/log/local/axedalistener/axedalistener-20161008222049.log (deleted) filebeat 25938 stapp 111r REG 253,8 52429002 1444294 /usr/local/st/mount1/sa-stats/log/local/fast/fast-20161009091241.log (deleted) filebeat 25938 stapp 116r REG 253,8 52428921 1443038 /usr/local/st/mount1/sa-stats/log/local/acceptfile/acceptfile-20161010103334.log (deleted) filebeat 25938 stapp 123r REG 253,8 20971573 1443018 /usr/local/st/mount1/sa-stats/log/local/axedalistener/axedalistener-20161008220239.log (deleted) filebeat 25938 stapp 124r REG 253,8 52428911 1444304 /usr/local/st/mount1/sa-stats/log/local/fast/fast-20161009093057.log (deleted) filebeat 25938 stapp 130r REG 253,8 52429843 1443118 /usr/local/st/mount1/sa-stats/log/local/acceptfile/acceptfile-20161010111444.log (deleted) filebeat 25938 stapp 166r REG 253,8 52435873 1443124 /usr/local/st/mount1/sa-stats/log/local/acceptfile/acceptfile-20161010111955.log (deleted)

<stapp@g2t4454c:filebeat5> date Tue Oct 11 21:05:16 UTC 2016

The pattern of the filenames include a timestamp of when the file was rotated. The file was not updated after this timestamp.

Its not that close_older does not work at all it does seem to be closing some files. I am wondering if the problem is only related to deleted files.

Sampling the log for some of the files...

<stapp@g2t4454c:filebeat5> grep fast-20161009093057.log logs/* logs/filebeat.log:2016-10-11T13:38:35Z INFO Harvester started for file: /opt/mount1/sa-stats/log/local/fast/fast-20161009093057.log logs/filebeat.log.2:2016-10-11T13:36:59Z INFO Harvester started for file: /opt/mount1/sa-stats/log/local/fast/fast-20161009093057.log logs/filebeat.log.4:2016-10-09T15:16:57Z INFO Harvester started for file: /opt/mount1/sa-stats/log/local/fast/fast-20161009093057.log logs/filebeat.log.6:2016-10-09T15:08:06Z INFO Harvester started for file: /opt/mount1/sa-stats/log/local/fast/fast-20161009093057.log
<stapp@g2t4454c:filebeat5> grep acceptfile-20161010111444.log logs/* logs/filebeat.log:2016-10-11T13:38:35Z INFO Harvester started for file: /opt/mount1/sa-stats/log/local/acceptfile/acceptfile-20161010111444.log logs/filebeat.log.2:2016-10-11T13:37:08Z INFO Harvester started for file: /opt/mount1/sa-stats/log/local/acceptfile/acceptfile-20161010111444.log
<stapp@g2t4454c:filebeat5> grep fast-20161009093057.log logs/* logs/filebeat.log:2016-10-11T13:38:35Z INFO Harvester started for file: /opt/mount1/sa-stats/log/local/fast/fast-20161009093057.log logs/filebeat.log.2:2016-10-11T13:36:59Z INFO Harvester started for file: /opt/mount1/sa-stats/log/local/fast/fast-20161009093057.log logs/filebeat.log.4:2016-10-09T15:16:57Z INFO Harvester started for file: /opt/mount1/sa-stats/log/local/fast/fast-20161009093057.log logs/filebeat.log.6:2016-10-09T15:08:06Z INFO Harvester started for file: /opt/mount1/sa-stats/log/local/fast/fast-20161009093057.log
Note the differing paths is the result of the use of a symbolic link in the prospector config.

Here we can see that some files are being closed.

2016-10-11T15:16:18Z INFO File is inactive: /opt/mount1/sa-stats/log/local/acceptfile/acceptfile-20161010154747.log. Closing because close_inactive of 5m0s reached. 2016-10-11T15:49:23Z INFO File is inactive: /opt/mount1/sa-stats/log/local/axedalistener/axedalistener-20161010082957.log. Closing because close_inactive of 5m0s reached. 2016-10-11T15:54:46Z INFO File is inactive: /opt/mount1/sa-stats/log/local/axedalistener/axedalistener-20161010082957.log. Closing because close_inactive of 5m0s reached.

Thanks in advance for any help you can provide.

ruflin · October 12, 2016, 6:09am

Can you share your config file? Can you add some details on the exact way you do file rotation? (rename file to add tiemstamp, create new file)? Also can you elaborate on the way you use symlinks?

chris.mcdermott · October 12, 2016, 4:48pm

filebeat.yml

filebeat:
    prospectors:
    -
      paths:
        - /opt/mount1/sa-stats/log/local/fast/*.log
        - /opt/mount1/sa-stats/log/local/filepuller/*.log
        - /opt/mount1/sa-stats/log/local/acceptfile/*.log
        - /opt/mount1/sa-stats/log/local/axedalistener/*.log
        - /opt/mount1/sa-stats/log/local/poller/*.log
        - /opt/mount1/sa-stats/log/local/groupnot/*.log
        - /opt/mount1/sa-stats/log/local/pluginhandler/*.log
      encoding: plain
      input_type: log
      close_removed: true
      clean_removed: true
      ignore_older: 168h
      document_type: log4j-itg
      multiline:
        pattern: "^[0-9]{4}-[0-9]{2}-[0-9]{2}T([0-9]{2}:){2}[0-9]{2},[0-9]"
        negate: true
        match: after
    -
      paths:
        - /opt/mount1/sa-stats/log/local/legacy/legacy-*.log
      encoding: plain
      input_type: log
      document_type: legacy-itg
      multiline:
        pattern: ^(.*)-*[[0-9]{4}-[0-9]{2}-[0-9]{2}T([0-9]{2}:){2}[0-9]{2},[0-9]
        negate: true
        match: before
  registry_file: /usr/local/st/filebeat5/registry
output:
  logstash:
    hosts: ["logstash.example.com9700"]
    worker: 15
    loadbalance: true
logging:
  to_syslog: false
  to_files: true
  files:
    path: /usr/local/st/filebeat5/logs
    name: filebeat.log
  level: info

The applications are written in Java and use log4j to do the file rotation. I believe that they use

close
rename
create new

Here is an example configuration

            <RollingFile name="RollingFileLocal" fileName="/opt/mount1/sa-stats/log/local/fast/fast.log"
                                    filePattern="/opt/mount1/sa-stats/log/local/fast/fast-%d{yyyyMMddHHmmss}.log">
                    <PatternLayout>
                            <Pattern>%d{ISO8601} %5p [%t] [%X{logger.client}] [%X{logger.module}] [%X{logger.vendor} %X{logger.product} %X{logger.productId} %X{logger.uuid} %X{logger.fid} %X{logger.eid} %X{logger.comp} %X{logger.LocalTxnID}] - %X{logger.prefix} %m%n</Pattern>
                    </PatternLayout>
                    <Policies>
                            <SizeBasedTriggeringPolicy size="50 MB" />
                    </Policies>

            <DefaultRolloverStrategy max="200">


            <Delete basePath="/opt/mount1/sa-stats/log/local/fast/" maxDepth="1">
              <IfFileName glob="fast-*.log">
                  <IfAny>
                    <IfAccumulatedFileSize exceeds="10 GB" />
                    <IfAccumulatedFileCount exceeds="200" />
                    <IfLastModified age="10d" />
                  </IfAny>
              </IfFileName>
            </Delete>
          </DefaultRolloverStrategy>

            </RollingFile>

Regarding symlinks, The filebeat config uses a path which contains a symlink.

<stapp@g2t4454c:filebeat5> ll /opt/mount1
lrwxrwxrwx 1 root root 20 Aug 22 19:41 /opt/mount1 -> /usr/local/st/mount1

ruflin · October 13, 2016, 7:15am

There is nothing special I found in the config. close_inactive is not defined which means the default of 5 minutes is used. Logrotation looks like this is the "normal" one.

Any chance that you could share the full log file of filebeat with me?
An other thing I spotted is mount1 in the filebeat config which for me indicates that you are reading logs from a shared drive?

About symlinks: I see, not the files itself are a symlinks but the directory with the files is a symlink. This should have no affect as for filebeat it looks like it reads the original file.

chris.mcdermott · October 13, 2016, 2:48pm

Hi.

mount1 is a not a shared drive. Here filebeat is running on a physical machine and mount1 is mount point for an LVM volume consisting of a single physical disk which is directly attached.

Is there a non-public way I can send you the logs?

ruflin · October 16, 2016, 1:43pm

@chris.mcdermott You can share it with me via email. Is the discussname@elastic.co

chris.mcdermott · October 20, 2016, 6:18pm

Sorry for being dense, but I can't figure out what the exact email address is.

I tried discussname@elastic.co and filebeat@elastic.co. Both bounced.

chris.mcdermott · October 20, 2016, 6:21pm

Third guess, beats@elastic.co, was the charm

chris.mcdermott · October 20, 2016, 6:23pm

Or maybe not...

Hello chris.mcdermott@example.com,

We're writing to let you know that the group you tried to contact (beats) may not exist, or you may not have permission to post messages to the group. A few more details on why you weren't able to post:

You might have spelled or formatted the group name incorrectly.
The owner of the group may have removed this group.
You may need to join the group before receiving permission to post.
This group may not be open to posting.

If you have questions related to this or any other Google Group, visit the Help Center at https://support.google.com/a/elastic.co/bin/topic.py?topic=25838.

Thanks,

elastic.co admins

tudor · October 21, 2016, 11:02am

I think he meant ruflin @ elastic.co

ruflin · October 24, 2016, 7:24am

@chris.mcdermott Sorry for the puzzle It is the one that @tudor posted.

ruflin · October 24, 2016, 2:39pm

@chris.mcdermott Thanks for the log files. Looking at your log files shows me lots of sending errors. As long as events cannot be sent to the output, the harvester will keep the file open. A workaround here is using close_timeout but this will lead to data loss in your case.

More important seems to me to investigate why quite frequently the sending of events to logstash fails. Interesting is that not all events fail, but only a subset.

2016-10-11T13:40:05Z INFO Non-zero metrics in the last 30s: libbeat.logstash.published_but_not_acked_events=4096 libbeat.logstash.call_count.PublishEvents=4 libbeat.publisher.published_events=4037 registrar.writes=2 libbeat.logstash.published_and_acked_events=3980 libbeat.logstash.publish.write_bytes=826186 registrar.states.update=4096 publish.events=4096 libbeat.logstash.publish.read_bytes=210 libbeat.logstash.publish.read_errors=2

That means the connection works but for example Logstash cannot keep up with all the events or network is flaky. How many filebeat instances are their sending the events to Logstash?

chris.mcdermott · October 24, 2016, 3:47pm

There are a lot of filebeat instances sending to logstash. Notice the problem is that logstash is dropping the connection (see the EOF errors). Its the latest version of logstash and we had upped the client timeout to a pretty hight number (300s I believe), but for some reason it prematurely terminates the connections.

ruflin · October 25, 2016, 7:23am

As I have the suspicion that the not closing of deleted files is related to the connection issue, I suggest we focus first on the connection issues. Do you see any errors / messages on the Logstash side? Could it be that Logstash cannot keep up with the load of all the filebeat instances sending data? How many events per second do you have on the LS side?

system · November 1, 2016, 9:19pm

This topic was automatically closed after 21 days. New replies are no longer allowed.

Topic		Replies	Views
Filebeat not closing file handle even after ignore_older Beats filebeat	4	1935	January 17, 2017
Filebeat hold already deleted files Beats filebeat	2	830	November 16, 2016
Filebeat not releasing file after it's been deleted Beats filebeat	4	1343	April 25, 2018
Filebeat Still holding the log files even after i upgrade from 5.1.2 to 5.5.2 Beats filebeat	3	351	April 20, 2018
Holding onto handles of deleted files Beats	2	1275	February 10, 2017

Filebeat 5.0 beta 1 is not closing deleted files

Related topics