Filebeat 5.0 not processing files after restart


(Ori Rubinfeld) #1

Hi,

I have Filebeat 5.0 (GA) installed on Windows machine.
After doing a restart to Filebeat, it is not processing any log files.
It has a registry and registry.old files.
After deleting those files and restarting service it start to process again.

Is it a known issue ?

Ori


(ruflin) #2

This issue is not known. Can you share the log files after the restart?


(Ori Rubinfeld) #3

Hi Ruflin,

Here are some messages when stopping Filebeat:

2016-11-10T09:34:53+02:00 INFO Stopping filebeat
2016-11-10T09:34:54+02:00 INFO Prospector outlet closed
2016-11-10T09:34:54+02:00 ERR Problem putting initial state: prospector outlet closed
2016-11-10T09:34:55+02:00 DBG Flushing spooler because of timeout. Events flushed: 1
2016-11-10T09:34:55+02:00 DBG No events to publish
2016-11-10T09:34:55+02:00 DBG Events sent: 1
2016-11-10T09:34:55+02:00 DBG Processing 1 events
2016-11-10T09:34:55+02:00 DBG Registrar states cleaned up. Before: 148 , After: 148
2016-11-10T09:34:55+02:00 DBG Write registry file: c:/Program Files/Filebeat/registry/registry_applogs
2016-11-10T09:34:55+02:00 DBG Registry file updated. 148 states written.
2016-11-10T09:34:57+02:00 INFO Prospector outlet closed
2016-11-10T09:34:57+02:00 ERR Problem putting initial state: prospector outlet closed
2016-11-10T09:34:59+02:00 INFO Prospector outlet closed
2016-11-10T09:34:59+02:00 ERR Problem putting initial state: prospector outlet closed
2016-11-10T09:35:00+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:35:02+02:00 INFO Prospector outlet closed
2016-11-10T09:35:02+02:00 ERR Problem putting initial state: prospector outlet closed
2016-11-10T09:35:04+02:00 INFO Non-zero metrics in the last 30s: registrar.states.update=9 publish.events=9 registrar.writes=5
2016-11-10T09:35:04+02:00 INFO Prospector outlet closed
2016-11-10T09:35:04+02:00 ERR Problem putting initial state: prospector outlet closed
.
.

Ori


(Ori Rubinfeld) #4

Here Are some messages after starting it

Here are after starting:

09:35:25 DBG Flushing spooler because of timeout. Events flushed: 2
09:35:25 DBG No events to publish
09:35:25 DBG Events sent: 2
09:35:25 DBG Processing 2 events
09:35:25 DBG Registrar states cleaned up. Before: 148 , After: 148
09:35:25 DBG Write registry file: c:/Program Files/Filebeat/registry/registry_applogs
09:35:25 DBG Registry file updated. 148 states written.
09:35:30 DBG Flushing spooler because of timeout. Events flushed: 1
09:35:30 DBG No events to publish
09:35:30 DBG Events sent: 1
09:35:30 DBG Processing 1 events
09:35:30 DBG Registrar states cleaned up. Before: 148 , After: 148
09:35:30 DBG Write registry file: c:/Program Files/Filebeat/registry/registry_applogs
09:35:30 DBG Registry file updated. 148 states written.
09:35:40 DBG Flushing spooler because of timeout. Events flushed: 0
09:35:44 INFO Non-zero metrics in the last 30s: libbeat.es.publish.read_bytes=804 registrar.writes=2 publish.events=3 registrar.states.update=3 libbeat.es.publish.write_bytes=468 registar.states.current=148
09:36:10 DBG Flushing spooler because of timeout. Events flushed: 0
09:36:14 INFO No non-zero metrics in the last 30s
09:36:15 DBG Flushing spooler because of timeout. Events flushed: 1
09:36:15 DBG No events to publish
09:36:15 DBG Events sent: 1
09:36:15 DBG Processing 1 events
09:36:40 DBG Registrar states cleaned up. Before: 148 , After: 148
09:36:40 DBG Write registry file: c:/Program Files/Filebeat/registry/registry_applogs
09:36:40 DBG Registry file updated. 148 states written.
09:36:44 INFO Non-zero metrics in the last 30s: publish.events=5 registrar.writes=5 registrar.states.update=5
09:36:45 DBG Flushing spooler because of timeout. Events flushed: 2
09:36:45 DBG No events to publish
09:36:45 DBG Events sent: 2
09:36:45 DBG Processing 2 events
09:36:45 DBG Registrar states cleaned up. Before: 148 , After: 148
09:36:45 DBG Write registry file: c:/Program Files/Filebeat/registry/registry_applogs
09:36:45 DBG Registry file updated. 148 states written.
09:36:50 DBG Flushing spooler because of timeout. Events flushed: 0
09:36:55 DBG Flushing spooler because of timeout. Events flushed: 2
09:36:55 DBG No events to publish
09:36:55 DBG Events sent: 2
09:37:05 DBG Registrar states cleaned up. Before: 148 , After: 148
09:37:05 DBG Write registry file: c:/Program Files/Filebeat/registry/registry_applogs
09:37:05 DBG Registry file updated. 148 states written.
09:37:10 DBG Flushing spooler because of timeout. Events flushed: 2
09:37:10 DBG No events to publish
09:37:10 DBG Events sent: 2
09:37:10 DBG Processing 2 events
09:37:10 DBG Registrar states cleaned up. Before: 148 , After: 148
09:37:10 DBG Write registry file: c:/Program Files/Filebeat/registry/registry_applogs
09:37:10 DBG Registry file updated. 148 states written.
09:37:14 INFO Non-zero metrics in the last 30s: registrar.states.update=10 registrar.writes=5 publish.events=10
09:37:15 DBG Flushing spooler because of timeout. Events flushed: 2
09:37:15 DBG No events to publish
09:37:15 DBG Events sent: 2
09:37:15 DBG Processing 2 events
09:37:20 DBG Registrar states cleaned up. Before: 148 , After: 148
09:37:20 DBG Write registry file: c:/Program Files/Filebeat/registry/registry_applogs
09:37:20 DBG Registry file updated. 148 states written.
09:37:25 DBG Flushing spooler because of timeout. Events flushed: 1
09:37:25 DBG No events to publish
09:37:25 DBG Events sent: 1
09:37:25 DBG Processing 1 events

Continues with:

09:40:50 DBG Registrar states cleaned up. Before: 148 , After: 148
09:40:50 DBG Write registry file: c:/Program Files/Filebeat/registry/registry_applogs
09:40:50 DBG Registry file updated. 148 states written.
09:41:10 DBG Flushing spooler because of timeout. Events flushed: 0
09:41:14 INFO Non-zero metrics in the last 30s: publish.events=1 registrar.states.update=1 registrar.writes=1
09:41:40 DBG Flushing spooler because of timeout. Events flushed: 0
09:41:41 INFO Previous states loaded: 148
09:41:41 DBG File Configs: [\qa-app01\Logs/AppLogs/RT*.log*]
09:41:41 DBG exclude_files: []
09:41:41 INFO Load previous states from registry into memory
09:41:44 INFO No non-zero metrics in the last 30s
09:42:05 DBG Flushing spooler because of timeout. Events flushed: 0
09:42:10 DBG Flushing spooler because of timeout. Events flushed: 1
09:42:10 DBG No events to publish
09:42:10 DBG Events sent: 1
09:42:10 DBG Processing 1 events
09:42:10 DBG Registrar states cleaned up. Before: 148 , After: 148
09:42:10 DBG Write registry file: c:/Program Files/Filebeat/registry/registry_applogs
09:42:10 DBG Registry file updated. 148 states written.
09:42:14 INFO Non-zero metrics in the last 30s: publish.events=1 registrar.writes=1 registrar.states.update=1
09:42:15 DBG Flushing spooler because of timeout. Events flushed: 0
09:42:25 DBG Flushing spooler because of timeout. Events flushed: 1

Ori


(Ori Rubinfeld) #5

Then after a while, continues only with:

2016-11-10T09:48:45+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:48:50+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:48:55+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:49:00+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:49:05+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:49:10+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:49:14+02:00 INFO No non-zero metrics in the last 30s
2016-11-10T09:49:15+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:49:20+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:49:25+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:49:30+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:49:35+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:49:40+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:49:44+02:00 INFO No non-zero metrics in the last 30s
2016-11-10T09:49:45+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:49:50+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:49:55+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:50:00+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:50:05+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:50:10+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:50:14+02:00 INFO No non-zero metrics in the last 30s
2016-11-10T09:50:15+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:50:20+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:50:25+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:50:30+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:50:35+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:50:40+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:50:44+02:00 INFO No non-zero metrics in the last 30s
2016-11-10T09:50:45+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:50:50+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:50:55+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:51:00+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:51:05+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:51:10+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:51:14+02:00 INFO No non-zero metrics in the last 30s
2016-11-10T09:51:15+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:51:20+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:51:25+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:51:30+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:51:35+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:51:40+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:51:44+02:00 INFO No non-zero metrics in the last 30s
2016-11-10T09:51:45+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:51:50+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:51:55+02:00 DBG Flushing spooler because of timeout. Events flushed: 0
2016-11-10T09:52:00+02:00 DBG Flushing spooler because of timeout. Events flushed: 0

Until I stop and Delete registry file.

Ori


(ruflin) #6

Could you check the modification time of the files that are not harvested? Is it in the past or in sync with the most recent update?

I'm kind of worried / confused by this line here: Problem putting initial state: prospector outlet closed. As this only applies when starting the prospectors, means when you shutdown filebeat not all prospectors were fully started?!? Can you share your config file?

For sharing logs, best use a gist and link it here. This make logs easier readable and does not required multiple posts.


(Ori Rubinfeld) #7

Here is a config file, it is the same for all filebeat processes.
The same phenomena occurred for all of them.

filebeat:
  registry_file: c:/Program Files/Filebeat/registry/registry_applogs
  prospectors:
    # AppLogs
    -
      paths:
        - \\qa-app01\Logs/AppLogs/RT*.log*
      input_type: log
      multiline:
        pattern: ^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[[:space:]][[:digit:]]{1,2},[[:space:]][[:digit:]]{4}[[:space:]][[:digit:]]{1,2}:[[:digit:]]{2}:[[:digit:]]{2}[[:space:]](AM|PM)
        negate: true
        match: after
      exclude_lines: [ ^\* ]
      ignore_older: 2h
      close_inactive: 20m
      document_type: applogs
      harvester_buffer_size: 16384
      encoding: plain
      fields_under_root: true
      fields:
        app_env: E1
        type_id: 110
        time_to_keep: 43200
        hostname: QA-APP01

    # AppLogs
    -
      paths:
        - \\qa-app02\Logs/AppLogs/RT*.log*
      input_type: log
      multiline:
        pattern: ^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[[:space:]][[:digit:]]{1,2},[[:space:]][[:digit:]]{4}[[:space:]][[:digit:]]{1,2}:[[:digit:]]{2}:[[:digit:]]{2}[[:space:]](AM|PM)
        negate: true
        match: after
      exclude_lines: [ ^\* ]
      ignore_older: 2h
      close_inactive: 20m
      document_type: applogs
      harvester_buffer_size: 16384
      encoding: plain
      fields_under_root: true
      fields:
        app_env: E1
        type_id: 110
        time_to_keep: 43200
        hostname: QA-APP02

    # AppLogs
    -
      paths:
        - \\qa-app03\Logs/AppLogs/RT*.log*
      input_type: log
      multiline:
        pattern: ^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[[:space:]][[:digit:]]{1,2},[[:space:]][[:digit:]]{4}[[:space:]][[:digit:]]{1,2}:[[:digit:]]{2}:[[:digit:]]{2}[[:space:]](AM|PM)
        negate: true
        match: after
      exclude_lines: [ ^\* ]
      ignore_older: 2h
      close_inactive: 20m
      document_type: applogs
      harvester_buffer_size: 16384
      encoding: plain
      fields_under_root: true
      fields:
        app_env: E0
        type_id: 110
        time_to_keep: 43200
        hostname: QA-APP03

    # AppLogs
    -
      paths:
        - \\qa-app04\Logs/AppLogs/RT*.log*
      input_type: log
      multiline:
        pattern: ^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[[:space:]][[:digit:]]{1,2},[[:space:]][[:digit:]]{4}[[:space:]][[:digit:]]{1,2}:[[:digit:]]{2}:[[:digit:]]{2}[[:space:]](AM|PM)
        negate: true
        match: after
      exclude_lines: [ ^\* ]
      ignore_older: 2h
      close_inactive: 20m
      document_type: applogs
      harvester_buffer_size: 16384
      encoding: plain
      fields_under_root: true
      fields:
        app_env: E0
        type_id: 110
        time_to_keep: 43200
        hostname: QA-APP04

logging:
  level: debug
  to_files: true
  to_syslog: false
  files:
    path: c:/Program Files/Filebeat/logs
    name: filebeat_applogs.log
    rotateeverybytes: 20971520
    keepfiles: 100

output:
#  file:
#    path: c:/Program Files/Filebeat/output
#    filename: filebeatdata_applogs.txt
#    rotate_every_kb: 10240
#    number_of_files: 10
  elasticsearch:
   hosts: ["qaelk06:9200", "qaelk07:9200"]
   index: "application-logs"
   ## bulk_max_size: 50, Default
   bulk_max_size: 10000
   ## flush_interval, Default unknown
   flush_interval: 60
   parameters.pipeline: "app-applogs-pipeline"

Ori


(Ori Rubinfeld) #8

Modification date of the log file is OK.
Compatible with its content.

I read that with Filebeat 5.0, it will be less dependent on the modification date, in order to read the log files and harvesting them.

Ori


(ruflin) #9

Yes, filebeat now mostly depends on the file size and not moddate anymore (except for ignore_older). But looking at the config this should not matter here.

Could you share the full log after restarting filebeat, especially the first 100 lines. How do you stop / start filebeat?

How many events do you have per second across your log files? I'm interested because you have quite a large flush_interval defined.


(Ori Rubinfeld) #10

Hi Ruflin,

I am using the service control of Windows.

Here is a gist link for the log:

There can be 10K-20K Events per seconds, but mostly there are not as that much, can be around 3K-4K.
We do have more than 20 types of logs that are being sent from one central Filebeat server (which reads them using FQDN).
Each Log type has its own Filebeat service.

Thanks,

Ori


(ruflin) #11

Unfortunately the log file you share does not contain the startup part. There should be quite a few messages inside about setting up etc. Can you share that part too?


(Ori Rubinfeld) #12

Hi Ruflin,

Here is a link to a new log file:

Thanks,

Ori


(ruflin) #13

Thanks for sharing the full log file. It seems like it does not even see the files. Perhaps this has something to do with the changes we made in 5.0 for the patterns. Your glob patterns mixes forward and backward slashes. Could you try to only use backword or only forward slashes?

- \\qa-app01\Logs/AppLogs/RT*.log*

(Ori Rubinfeld) #14

Hi Ruflin,

Tried both, It did not change.
Still Filebeat did not take any log file.

Ori


(ruflin) #15

Thanks for trying this out. If I get you right, if you start filebeat for the first time, it picks up the files and reading works. Stopping filebeat and restarting it does not pick up the anymore. This would mean non of the prospectors seems to be "responsible" for the files. There were some changes in 5.0.0-rc1 where each prospector only tried to pick up the files that match the pattern. I kind of assume something is going wrong here. This part was improved in master (5.1). There are some special behaviours happening in the glob pattern on Windows, but under 5.0 it should just work, as in case it does not know what to do with it, it will just pick it up.

Any chance you could try the latest build? https://beats-nightlies.s3.amazonaws.com/index.html?prefix=filebeat/ I assume filebeat is still reading from network / shared drives?


(Ori Rubinfeld) #16

Yes, you are correct, this is the exact scenario.
With Filebeat 5.0 Alpha 4 and Alpha 5, it did not happen.
Did not tested RC1.
Data is still being read from network.

I will try the latest build and update.

Ori


(ruflin) #17

Good to know it worked with alpha4 and alph5. This is definitively an indication that there is something wrong with the new scan :frowning: Let me know if it works with the latest build.


(ruflin) #18

@ori.rubinfeld Could you provide an excerpt form the registry file? Because there are the paths based on which filebeat will compare the previous paths (in 5.0).


(Ori Rubinfeld) #19

Hi,

Here is a gist for it:

Thanks,

Ori


(ruflin) #20

Thanks for the registry. Did you have a chance to test the nightly build?