Closing file after its contents have been read

I'm using Filebeat 8.14.3 to collect lots of different files and the close.reader.on_eof doesn't seem to be working. The file only needs to be read once and then closed. If you need more of the filebeat.yml, please let me know.

I am sure its something I did, but I can't figure it out.

Filestream example in filebeat.yml

  - type: filestream
    enabled: true
    id: commands
    paths:
      - /usr/share/filebeat/sos/*/*/sos_commands/date/*
      - /usr/share/filebeat/sos/*/*/sos_commands/foobar_os/foobarcmd/*
      - /usr/share/filebeat/sos/*/*/sos_commands/foobar_os/user config/config_current.txt
      - /usr/share/filebeat/sos/*/*/sos_commands/hardware/dmidecode
      - /usr/share/filebeat/sos/*/*/sos_commands/logrotate/logrotate_debug
    parsers:
      - multiline:
          pattern: '^{.*}$' 
          negate: true
          match: after
    processors:
      - add_fields:
          fields:
            dest: commands
    close.reader.on_eof: true

Can you share how you're testing this? What are you seeing which implies that read.on_eof isnt working?

Here is a little background. Basically I am getting a tar that I extract into a directory that Filebeat container(Docker) has access to and is monitoring those directories. I have other filestreams in the same filebeat.yml monitoring it as well.

When the tar is extracted I see the memory for filebeat go from MB to GB and it doesn't go down. Eventually the docker service will crash because too many files are open. As a test I would delete files that were extracted to see if Filebeats memory goes down and it did.

I can run Filebeat in debug to find better proof, but I don't know what selectors to choose. Would it be harvester?

Do I have close.reader.on_eof: true on the correct level in the yaml?

If you are dealing with a very large number of files I would try:

  1. Limiting the number of harvesters via harvester_limit, perhaps to 25-100 or so from the default of unlimited
  2. Setting prospector.scanner.check_interval to something longer than 10s like 30s to reduce how often the files are polled

That setting does look like it's set in the right place.

Thank you for the direction. I will give it a go :crossed_fingers: