fileBeat isn't harvesting the logs from the last path's

We are using ELK for controlling our program logs.
In our FileBeat config we are harvesting from 30 different paths which contains files that updates every second (it updates every second only in the prod's machines - in the other Dev machines we have significantly less logs).
Our log files have the same name, and not get deleted until they getting old and we stop using them.
lately we found out that the logs from last paths in the configuration file (.yml) from the prod machines is never appearing in the Kibana.

After investigation, we realized that FileBeat stuck on the files is the first's path's and never seem to reach the last one's. When I replace the location of the last two paths to the beginning, FileBeat started to register all the logs there and later on harvest them.

I looked up in the documentation on the FileBeat configuration and I saw the close* options close_option_config which seem like a good idea. But after I tried to play with configuration and adding to the paths that are getting updated the most:

      close_timeout: 15s
      scan_frequency: 5m

It still didn't help.

I would like to hear some opinion about what can I do to solve this problem?
I put the config here to have some reference and to see if I missed something else:

my filebeat.yml (before my changes):

      filebeat:
      # List of prospectors to fetch data.
      prospectors:
        # Each - is a prospector. Below are the prospector specific configurations
        -
          paths:
            - D:\logs\*\path1\a_*_Pri_app.log.txt
          input_type: log
          document_type: type1
          multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
          multiline.negate: true
          multiline.match: after
        -
          paths:
            - D:\logs\*\path2\b_*_Paths_app.log.txt
          input_type: log
          document_type: type2
          multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
          multiline.negate: true
          multiline.match: after
        -
          paths:
            - D:\logs\*\path3\c_*_R_app.log.txt
          input_type: log
          document_type: path3
          multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
          multiline.negate: true
          multiline.match: after
        -
          paths:
            - D:\logs\*\path4\d_*_d_app.log.txt
            - C:\logs\*\path4\d_*_d_app.log.txt
          input_type: log
          document_type: path4
          multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
          multiline.negate: true
          multiline.match: after

.........same as the above

          paths:
            - D:\logs\*\path27\S.Coordinator_Z.*.log*
            - C:\logs\*\path27\S.Coordinator_Z*.log*
          input_type: log
          document_type: path27
          multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
          multiline.negate: true
          multiline.match: after
        -
          paths:
            - D:\logs\*\path28\d_*_Tr_app.log.txt
            - C:\logs\*\path28\d_*_Tr_app.log.txt
          input_type: log
          document_type: path28
          multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
          multiline.negate: true
          multiline.match: after
        -
          paths:
            - D:\logs\*\R1_Output\R*\pid_*_rr_*
          input_type: log
          document_type: path29
          multiline.pattern: '<\?xml version="1\.0" encoding="UTF-8"\?>'
          multiline.negate: true
          multiline.match: after  
        -
          paths:
            - D:\logs\*\R2_Output\R*\pid_*_rr_*
          input_type: log
          document_type: path30
          multiline.pattern: '<\?xml version="1\.0" encoding="UTF-8"\?>'
          multiline.negate: true
          multiline.match: after

          registry_file: "C:/ProgramData/filebeat/registry"

Which version of Filebeat are you using? Could you share the log file?

One idea I have is that you run out of open file handlers. How many files in total are harvested and how many open file handlers do you allow the user with which filebeat is run to have?

The default scan_frequency is 10s. In case you have a very large amount of files and you scan too often, this could put the disk under heavy load and take very long to finish the largest scans. But we should know more here with the logs.

Hi @ruflin,
I'm using the 5.3.0-windows version.
I am harvesting a lot of files -the total amount of files in the logs folder is 6,371,987
When the total size is 112,031,073,015 bytes - the Regex filters out some of them and most of them not getting updated every second.
We have about 9000 files that are getting updated\published every hour (most of them every few minutes\seconds)

About your question:

how many open file handlers do you allow the user with which filebeat is run to have?

I don't know how to check that.

A random part of my log file:

2017-11-12T20:26:53-05:00 INFO Home path: [C:\Program Files (x86)\g.Sis.Setup\Filebeat\filebeat-5.3.0-windows] Config path: [C:\Program Files (x86)\g.Sis.Setup\Filebeat\filebeat-5.3.0-windows] Data path: [C:\Program Files (x86)\g.Sis.Setup\Filebeat\filebeat-5.3.0-windows\data] Logs path: [C:\Program Files (x86)\g.Sis.Setup\Filebeat\filebeat-5.3.0-windows\logs]
2017-11-12T20:26:53-05:00 INFO Setup Beat: filebeat; Version: 5.3.0
2017-11-12T20:26:53-05:00 INFO Max Retries set to: 3
2017-11-12T20:26:53-05:00 INFO Activated logstash as output plugin.
2017-11-12T20:26:53-05:00 INFO Publisher name: gSIMPAPUA01
2017-11-12T20:26:53-05:00 INFO Flush Interval set to: 1s
2017-11-12T20:26:53-05:00 INFO Max Bulk Size set to: 2048
2017-11-12T20:26:53-05:00 INFO filebeat start running.
2017-11-12T20:26:53-05:00 INFO Registry file set to: C:/ProgramData/filebeat/registry
2017-11-12T20:26:55-05:00 INFO Loading registrar data from C:/ProgramData/filebeat/registry
2017-11-12T20:27:23-05:00 INFO No non-zero metrics in the last 30s
2017-11-12T20:27:32-05:00 INFO States Loaded from registrar: 1307049
2017-11-12T20:27:32-05:00 INFO Loading Prospectors: 28
2017-11-12T20:27:32-05:00 INFO Start sending events to output
2017-11-12T20:27:32-05:00 INFO Starting Registrar
2017-11-12T20:27:32-05:00 INFO Starting spooler: spool_size: 2048; idle_timeout: 5s
2017-11-12T20:27:35-05:00 INFO Prospector with previous states loaded: 86
2017-11-12T20:27:35-05:00 INFO Starting prospector of type: log; id: 7213405998840431974 
2017-11-12T20:27:38-05:00 INFO Prospector with previous states loaded: 39
2017-11-12T20:27:38-05:00 INFO Starting prospector of type: log; id: 17831093792610870283 
2017-11-12T20:27:43-05:00 INFO Prospector with previous states loaded: 38
2017-11-12T20:27:43-05:00 INFO Starting prospector of type: log; id: 557048098814185631 
2017-11-12T20:27:46-05:00 INFO Prospector with previous states loaded: 37
2017-11-12T20:27:46-05:00 INFO Starting prospector of type: log; id: 13485594898086406273 
2017-11-12T20:27:49-05:00 INFO Prospector with previous states loaded: 61
2017-11-12T20:27:49-05:00 INFO Starting prospector of type: log; id: 4878773535636047747 
2017-11-12T20:27:51-05:00 INFO Prospector with previous states loaded: 0
2017-11-12T20:27:51-05:00 INFO Starting prospector of type: log; id: 3387719506395576271 
2017-11-12T20:27:53-05:00 INFO Non-zero metrics in the last 30s: publish.events=261 registrar.states.current=1307049 registrar.states.update=125 registrar.writes=1
2017-11-12T20:27:54-05:00 INFO Prospector with previous states loaded: 0
2017-11-12T20:27:54-05:00 INFO Starting prospector of type: log; id: 8522556418946833442 
2017-11-12T20:27:57-05:00 INFO Prospector with previous states loaded: 0
2017-11-12T20:27:57-05:00 INFO Starting prospector of type: log; id: 5831658475057830500 
2017-11-12T20:28:00-05:00 INFO Prospector with previous states loaded: 0
2017-11-12T20:28:00-05:00 INFO Starting prospector of type: log; id: 796047758819781197 
2017-11-12T20:28:02-05:00 INFO Prospector with previous states loaded: 39
2017-11-12T20:28:02-05:00 INFO Starting prospector of type: log; id: 15169931774471087751 
2017-11-12T20:28:05-05:00 INFO Prospector with previous states loaded: 37
2017-11-12T20:28:05-05:00 INFO Starting prospector of type: log; id: 3889062819866289695 
2017-11-12T20:28:08-05:00 INFO Prospector with previous states loaded: 0
2017-11-12T20:28:08-05:00 INFO Starting prospector of type: log; id: 3138113755011636671 
2017-11-12T20:28:11-05:00 INFO Prospector with previous states loaded: 0
2017-11-12T20:28:11-05:00 INFO Starting prospector of type: log; id: 12339791514027369778 
2017-11-12T20:28:14-05:00 INFO Prospector with previous states loaded: 38
2017-11-12T20:28:14-05:00 INFO Starting prospector of type: log; id: 16282258901879547111 
2017-11-12T20:28:17-05:00 INFO Prospector with previous states loaded: 86
2017-11-12T20:28:17-05:00 INFO Starting prospector of type: log; id: 19168137906957421 
2017-11-12T20:28:20-05:00 INFO Prospector with previous states loaded: 57
2017-11-12T20:28:20-05:00 INFO Starting prospector of type: log; id: 12769640348789108575 
2017-11-12T20:28:23-05:00 INFO Non-zero metrics in the last 30s: publish.events=257 registrar.states.update=212 registrar.writes=3
2017-11-12T20:28:25-05:00 INFO Prospector with previous states loaded: 60
2017-11-12T20:28:25-05:00 INFO Starting prospector of type: log; id: 2741752581921039923 
2017-11-12T20:28:25-05:00 INFO Harvester started for file: D:\logs\UA01\Scheduler\Sis.Scheduler_Zen.[11184].log
2017-11-12T20:28:29-05:00 INFO Prospector with previous states loaded: 751
2017-11-12T20:28:29-05:00 INFO Starting prospector of type: log; id: 9096281014884465928 
2017-11-12T20:28:29-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.11_Zen.[15484].log
2017-11-12T20:28:29-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.12_Zen.[16988].log
2017-11-12T20:28:29-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.4_Zen.[14792].log
2017-11-12T20:28:29-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.9_Zen.[21416].log
2017-11-12T20:28:29-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.4_Zen.[9172].log
2017-11-12T20:28:29-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.7_Zen.[14016].log
2017-11-12T20:28:29-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.6_Zen.[22876].log
2017-11-12T20:28:29-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.6_Zen.[5092].log
2017-11-12T20:28:29-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.2_Zen.[16312].log

The sequel of the log file:

2017-11-12T20:28:29-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.4_Zen.[15264].log
2017-11-12T20:28:29-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.1_Zen.[18364].log
2017-11-12T20:28:29-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.5_Zen.[23900].log
2017-11-12T20:28:29-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.8_Zen.[10532].log
2017-11-12T20:28:33-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.2_Zen.[22744].log
2017-11-12T20:28:33-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.6_Zen.[13984].log
2017-11-12T20:28:33-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.10_Zen.[7184].log
2017-11-12T20:28:33-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.2_Zen.[9548].log
2017-11-12T20:28:33-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.8_Zen.[22424].log
2017-11-12T20:28:33-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.3_Zen.[22440].log
2017-11-12T20:28:33-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.4_Zen.[15332].log
2017-11-12T20:28:33-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.7_Zen.[14676].log
2017-11-12T20:28:33-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.4_Zen.[10016].log
2017-11-12T20:28:33-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.2_Zen.[17172].log
2017-11-12T20:28:33-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.2_Zen.[13436].log
2017-11-12T20:28:33-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.4_Zen.[21036].log
2017-11-12T20:28:33-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.9_Zen.[23512].log
2017-11-12T20:28:33-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.10_Zen.[9064].log
2017-11-12T20:28:33-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.4_Zen.[8004].log
2017-11-12T20:28:33-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.3_Zen.[14380].log
2017-11-12T20:28:33-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.5_Zen.[11116].log
2017-11-12T20:28:33-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.15_Zen.[9364].log
2017-11-12T20:28:45-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.4_Zen.[21668].log
2017-11-12T20:28:45-05:00 INFO Prospector with previous states loaded: 51
2017-11-12T20:28:45-05:00 INFO Starting prospector of type: log; id: 5126941041294342105 
2017-11-12T20:28:45-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.1_Zen.[3460].log
2017-11-12T20:28:45-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.2_Zen.[23156].log
2017-11-12T20:28:45-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.6_Zen.[18980].log
2017-11-12T20:28:45-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.8_Zen.[20600].log
2017-11-12T20:28:45-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.12_Zen.[8072].log
2017-11-12T20:28:45-05:00 INFO Harvester started for file: D:\logs\UA01\Gateway\Sis.Gateway_Zen.[7252].log
2017-11-12T20:28:45-05:00 INFO Harvester started for file: D:\logs\UA01\Gateway\Sis.Gateway_Zen.[8476].log
2017-11-12T20:28:45-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.1_Zen.[9292].log
2017-11-12T20:28:45-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.5_Zen.[11700].log
2017-11-12T20:28:45-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.5_Zen.[4996].log
2017-11-12T20:28:45-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.3_Zen.[3056].log
2017-11-12T20:28:53-05:00 INFO Non-zero metrics in the last 30s: filebeat.harvester.open_files=44 filebeat.harvester.running=44 filebeat.harvester.started=44 libbeat.logstash.call_count.PublishEvents=2 libbeat.logstash.publish.read_bytes=780 libbeat.logstash.publish.write_bytes=156034 libbeat.logstash.published_and_acked_events=1294 libbeat.publisher.published_events=1294 publish.events=2119 registrar.states.update=181 registrar.writes=2
2017-11-12T20:28:57-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.2_Zen.[24716].log
2017-11-12T20:28:57-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.7_Zen.[9536].log
2017-11-12T20:28:57-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.4_Zen.[18464].log
2017-11-12T20:28:57-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.9_Zen.[7292].log
2017-11-12T20:28:57-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.7_Zen.[23180].log
2017-11-12T20:28:57-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.6_Zen.[11328].log
2017-11-12T20:28:57-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.5_Zen.[14556].log
2017-11-12T20:28:57-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.7_Zen.[23668].log
2017-11-12T20:28:57-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.9_Zen.[10388].log
2017-11-12T20:28:57-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.5_Zen.[27444].log
2017-11-12T20:28:57-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.5_Zen.[10184].log
2017-11-12T20:28:57-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.6_Zen.[9152].log
2017-11-12T20:29:09-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.7_Zen.[10296].log
2017-11-12T20:29:09-05:00 INFO Prospector with previous states loaded: 65
2017-11-12T20:29:09-05:00 INFO Starting prospector of type: log; id: 281559010682130971 
2017-11-12T20:29:09-05:00 INFO Harvester started for file: D:\logs\UA01\RSever\Sis.RSever.1_Zen.[26784].log

I'm pretty sure the issue you see above is related to the number of files you have. Just doing a simple ls for 6 million files takes quite a bit of time and puts load on the disks.

Unfortunately I don't know how to check on Windows for how many open file handlers are allowed. Based on some google search it also depends if you are on 32bit or 64bit.

In any case, I would encourage you to other ways on how the files are collected and stored. Are all this file generated on one machine or do you collect it from different machines? Could you have FB on each machine? What is the format of the files? Are they generated per x time frame? Could you move them to an other place after they are sent out? Just some ideas on how to move forward here.

In general I think the problem is not directly Filebeat but more that scanning 6 million files all the time is probably not sustainable, especially assuming the number of files is growing. It would be interesting to hear how long a simple ls command for all these files takes on your system.

@ruflin Thank you on your tips.

I thought about opening several more filebeat's services.
I would like to know if you have in mind any side effect that may occur (apart from the obvious memory consumption).
I will try now to split my .yml to 2 process to split the work.

I don't think opening multiple filebeat instances on the same machine will really help here as I assume the limit is the machine / disk and not really filebeat. If you copy the files from somewhere, I would encourage you to use 2 or more different machines.

Update,
I managed to solve the problem by opennnig to more services of filebeat:
with the following configurations:

paths:
    - D:\logs\*\pid_*_rr_*
  input_type: log
  document_type: A 
  multiline.pattern: '<\?xml version="1\.0" encoding="UTF-8"\?>'
  multiline.negate: true
  multiline.match: after
  close_eof: true

And because of your warning i checked the difference in the harvesting amount, and it actually doubled
and I even took a day to check if now we are harvesting all the files -and it seems like yes (not 100% sure but i checked on all the last files on each path).

In conclusion-open an extra service to special cases like this - might be the solution (unless you change something on filebeat 6.0.0 - that we didn't upgrade to it yet)

P.S
sorry on the delay - it took me time to verify my solution.
Also I posted this questions band answer in stack overflow for the common good https://stackoverflow.com/q/47182771/8868108

1 Like

Thanks for getting back on this one. It's very interesting to know that a second Filebeat instance helped on this one. It could be that two instances make better use of multiple processors or access the disk differently.

I don't think there are any changes in 6.0 that would change anything here.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.