On restart Harversters are opened for ignored files


(Stuart Fraser) #1

Impact

Unable to restart filebeat as expected after ignoring a lot of files.

Description

When filebeat is restarted Harversters open every file even if they should be ignored by the ignore_older setting.

Environment

filebeat-1.2.2
Unix Suse (SLES) 11.4 64 bit

Steps to Reproduce

  1. Use the following filebeat.yml

    filebeat:
    prospectors:
    -
    paths:
    - oldfile*.log
    input_type: log
    ignore_older: 2h
    registry_file: filebeat.registry
    output:
    console:
    shipper:
    logging:
    files:
    rotateeverybytes: 10485760 # = 10MB
    level: info

  2. Create a whole lot of old files using

bash
for i in {1..50000}; do touch --date="2009-11-23 19:43:57" oldfile$i.log; done

  1. Make sure the registry file does not exist by using

rm filebeat.registry

  1. Run filebeat using

filebeat -e -v -c filebeat.yml

  1. Observe Harversters are opened for all the ignored files as well a a bunch of file descriptor errors such as:

2016/05/13 21:39:21.206773 log.go:113: INFO Harvester started for file: oldfile1.log
2016/05/13 21:39:21.206941 log.go:113: INFO Harvester started for file: oldfile10008.log
....
2016/05/13 21:39:21.795646 log.go:113: INFO Harvester started for file: oldfile17191.log
2016/05/13 21:39:21.795917 log.go:113: INFO Harvester started for file: oldfile17193.log
2016/05/13 21:39:21.796116 log.go:248: ERR Failed opening oldfile17195.log: open oldfile17195.log: too many open files
2016/05/13 21:39:21.796177 log.go:103: ERR Stop Harvesting. Unexpected file opening error: open oldfile17195.log: too many open files
2016/05/13 21:39:21.795437 log.go:113: INFO Harvester started for file: oldfile17186.log
2016/05/13 21:39:21.795556 log.go:113: INFO Harvester started for file: oldfile1719.log
2016/05/13 21:39:21.795905 log.go:113: INFO Harvester started for file: oldfile17194.log
2016/05/13 21:39:21.796044 log.go:248: ERR Failed opening oldfile17196.log: open oldfile17196.log: too many open files
2016/05/13 21:39:21.796364 log.go:103: ERR Stop Harvesting. Unexpected file opening error: open oldfile17196.log: too many open files
2016/05/13 21:39:21.796352 log.go:248: ERR Failed opening oldfile17198.log: open oldfile17198.log: too many open files
2016/05/13 21:39:21.796389 log.go:103: ERR Stop Harvesting. Unexpected file opening error: open oldfile17198.log: too many open files
2016/05/13 21:39:21.796477 log.go:248: ERR Failed opening oldfile172.log: open oldfile172.log: too many open files
2016/05/13 21:39:21.796504 log.go:103: ERR Stop Harvesting. Unexpected file opening error: open oldfile172.log: too many open files
2016/05/13 21:39:21.796479 log.go:248: ERR Failed opening oldfile17197.log: open oldfile17197.log: too many open files


(ruflin) #2

Thanks a lot for the detailed report and reproduction steps. I will look into this and report back here.


(ruflin) #3

I tried to reproduce this but so far without success. In my tests it skips the files because of ignore_older (as it should). There is a log line directly after startup that states the ignore_older time, can you share this line?

2016/05/17 08:29:55.743044 prospector.go:132: INFO Set ignore_older duration to 2h0m0s

(Stuart Fraser) #4

I may have missed a step ...

2a) Create at least 1 new file

touch outfileA.log

Any new file and all ignored older files are added to the registry. When you restart filebeat 1.2.2 opens all the older than files.


(ruflin) #5

Ok, now I can reproduce it. What I did is now:

  • Start filebeat with all the ignore_older files
  • Stop filebeat
  • Start filebeat again (without removing the state)

(ruflin) #6

It is not directly a bug but was the expected behaviour to make sure, in case one of the ignore_older files actually got new content, that the new content is read. But it is not nice so I opened a PR with a fix that checks if the offsets are the same: https://github.com/elastic/beats/pull/1649

This is also related to the discussion here: https://github.com/elastic/beats/issues/1600


(Stuart Fraser) #7

Thanks!


(system) #8