Filebeat doesn't clear the registry file after logrotation


#1

Hi,

OS: Linux CentOS 7
Filebeat: filebeat.x86_64 5.0.2-1

We decided to logrotate the logs that are send to ES. The logrotate is done and we want to track the the old file for 5 minutes to not loose any lines and then the file will be deleted from the server.

We are testing this at the moment and the issue is that the registry file is not deleted after the old log is removed. Also the registry is showing two logs with the same file

Filebeat.yml

filebeat.prospectors:
- input_type: log
  paths:
    - /root/demo01.log
  scan_frequency: 11s
  document_type: anna01
- input_type: log
  paths:
    - /root/demo01.log-*
  exclude_files: ['\.gz$']
  document_type: anna01
  ignore_older: 1m
  close_inactive: 3m
  close_removed: true
  clean_removed: true

So the demo01.log is the application log and demo01.log-YYYYMMMDD is the rotated log which will be removed by cron job.

yet even when I delete the demo01.log-YYYYMMDD, the registry file is not updated to reflect the deletion of the file

Please help, how to clean the registry file?

We tried different combination of close_inactive, close_removed, clean_inactive, clean_removed which still do not update the registry.


Filebeat use insane memory
(ruflin) #2

UPDATE: I think I spotted the problem in your config. A file should also after rotation be harvested by the same prospector. But you have defined 2 prospectors, and the first one will never close the file or send the clean_removed command.

So your config should probably look like:

filebeat.prospectors:
- input_type: log
  paths:
    - /root/demo01.log-*
  scan_frequency: 11s
  document_type: anna01
  exclude_files: ['\.gz$']
  ignore_older: 10m
  close_inactive: 3m
  close_removed: true
  clean_removed: true

PS: There is an issue with your above config in that ignore_older is smaller then close_inactive. This should not be the case.


#3

If I do what you suggested in filebeat.yml the registry is not created at all and the files are not send to ES.

When I will do one prospector and

paths:
- /root/demo01.log*

The files are send and when log rotates it adding the inode file to the registry but it doesn't clear the registry after.

What is the default amount of time that the registry will be cleared after the file is removed?

We also have a questions regarding the catering for files that are coming and going - can we switch that option off, or what is the right config for our old logs that are deleted permanently and we want to track them only for 2-5 minutes after they rotate to not loose any lines. We do not want to the registry go too large as well so we need a config that will track both files
-demo01.log
-demo01.log-YYYYMMDD

for 5 minutes and then when the second file is not updated, to be dropped by filebeat completely.

I am also afraid that if I will include ignore_older in the config for the normal demo01.log that it will close and not send the updates to the ES while application is still writing to it.


(ruflin) #4

You are right the correct path is /root/demo01.log* or even /root/demo01.*. Can you share some log files please on at least the info level? This should make it clear why files are not removed. The registry is cleared every time new data is added.

I think the config you are looking for is clean_inactive.

You can also set ignore_older to a much larger value, but is must be smaller then clean_inactive as otherwise the two will get into a race condition.

In your case I would think that setting ignore_older to 24 hours should work and clean_inactive to 36 hours for example as you seem to have daily logs.


#5

It is still not working, as the registry is still not clearing the old, removed log

this is the config now (I set up minutes for test purposes, it will not be the same for normal logs in Prod)

filebeat.prospectors:
- input_type: log
  paths:
    - /root/demo01.log*
  scan_frequency: 11s
  exclude_files: ['\.gz$']
  document_type: anna01
  ignore_older: 5m
  close_inactive: 3m
  clean_inactive: 7m
  close_removed: true
  clean_removed: true

This is the cycle we are testing on:

demo01.log is added to filebeat.yml and it is 5.7Mb (config above). I am adding anna01 index to ES, and then start filebeat without registry file. Log is sent, registry file created with one entry. Then I manually rotate the log with lilnux logrotate command, the demo01.log is renamed to demo01.log-YYYYMMDD and new file is created with the same privileges. Registry file changes to reflect the new file.
Log rotation command was issued : 10:21:10am

Then I remove the demo01.log-YYYYMMDD from the folder which in our situation will be done around an hour after the rotation. I added clean_inactive and set it more then ignore_older value.

demo01.log-YYYYMMDD was removed at 10:29:49am my time. I am assuming I need to wait 7m for the registry to be updated with the file deletion as this is the biggest value in the yml file

This is registry file at 10:45:18am (there is improvement as the deleted log is now with full file name in registry)

Filebeat log on INFO:

2016-12-08T10:18:15Z INFO Load previous states from registry into memory
2016-12-08T10:18:15Z INFO Previous states loaded: 0
2016-12-08T10:18:15Z INFO Loading Prospectors completed. Number of prospectors: 1
2016-12-08T10:18:15Z INFO All prospectors are initialised and running with 0 states to persist
2016-12-08T10:18:15Z INFO Starting prospector of type: log
2016-12-08T10:18:15Z INFO Harvester started for file: /root/demo01.log
2016-12-08T10:18:15Z INFO Metrics logging every 30s
2016-12-08T10:18:15Z INFO Starting Registrar
2016-12-08T10:18:15Z INFO Start sending events to output
2016-12-08T10:18:15Z INFO Starting spooler: spool_size: 2048; idle_timeout: 5s
2016-12-08T10:18:45Z INFO Non-zero metrics in the last 30s: registar.states.current=1 filebeat.harvester.open_files=1 publish.events=12574 registrar.states.update=12574 libbeat.logstash.call_count.PublishEvents=7 libbeat.publisher.published_events=12572 registrar.writes=7 libbeat.logstash.publish.write_bytes=832008 libbeat.logstash.published_and_acked_events=12572 filebeat.harvester.started=1 libbeat.logstash.publish.read_bytes=150 filebeat.harvester.running=1
2016-12-08T10:19:15Z INFO No non-zero metrics in the last 30s
2016-12-08T10:19:45Z INFO No non-zero metrics in the last 30s
2016-12-08T10:21:11Z INFO Harvester started for file: /root/demo01.log
2016-12-08T10:21:15Z INFO Non-zero metrics in the last 30s: filebeat.harvester.started=1 filebeat.harvester.open_files=1 filebeat.harvester.running=1 registrar.states.update=1 publish.events=1 registar.states.current=1 registrar.writes=1
2016-12-08T10:21:24Z INFO File is inactive: /root/demo01.log. Closing because close_inactive of 3m0s reached.
2016-12-08T10:24:45Z INFO Non-zero metrics in the last 30s: publish.events=1 registrar.states.update=1 filebeat.harvester.running=-1 filebeat.harvester.closed=1 registrar.writes=1 filebeat.harvester.open_files=-1
2016-12-08T10:25:15Z INFO No non-zero metrics in the last 30s
2016-12-08T10:25:45Z INFO No non-zero metrics in the last 30s
2016-12-08T10:26:15Z INFO No non-zero metrics in the last 30s
2016-12-08T10:26:45Z INFO No non-zero metrics in the last 30s
2016-12-08T10:27:15Z INFO No non-zero metrics in the last 30s
2016-12-08T10:27:45Z INFO No non-zero metrics in the last 30s
2016-12-08T10:28:15Z INFO No non-zero metrics in the last 30s
2016-12-08T10:28:45Z INFO No non-zero metrics in the last 30s
2016-12-08T10:29:15Z INFO No non-zero metrics in the last 30s

(ruflin) #6

Can you set the logging level to debug mode so we can see more details on what is happening?

You mentioned that you removed the file at 10:29:49, but the log file you pasted only goes until 10:29:15.

Please also paste registry entries as text as it is very hard to read and search screenshots.


#7

Registry file before the removal of the old log:

[{"source":"/root/demo01.log-20161212","offset":5920234,"FileStateOS":{"inode":9284449,"device":51713},"timestamp":"2016-12-12T15:31:19.039097522Z","ttl":420000000000},{"source":"/root/demo01.log","offset":0,"FileStateOS":{"inode":9284452,"device":51713},"timestamp":"2016-12-12T15:31:19.039098591Z","ttl":420000000000}]

Log set to debug

2016-12-12T15:29:04Z DBG Disable stderr logging
2016-12-12T15:29:04Z INFO Home path: [/usr/share/filebeat] Config path: [/etc/filebeat] Data path: [/var/lib/filebeat] Logs path: [/var/log/filebeat]
2016-12-12T15:29:04Z INFO Setup Beat: filebeat; Version: 5.0.2
2016-12-12T15:29:04Z DBG Processors:
2016-12-12T15:29:04Z DBG Initializing output plugins
2016-12-12T15:29:04Z INFO Max Retries set to: 3
2016-12-12T15:29:04Z INFO Activated logstash as output plugin.
2016-12-12T15:29:04Z DBG Create output worker
2016-12-12T15:29:04Z DBG No output is defined to store the topology. The server fields might not be filled.
2016-12-12T15:29:04Z INFO Publisher name: SYS_ANNA-02
2016-12-12T15:29:04Z INFO Flush Interval set to: 1s
2016-12-12T15:29:04Z INFO Max Bulk Size set to: 2048
2016-12-12T15:29:04Z DBG create bulk processing worker (interval=1s, bulk size=2048)
2016-12-12T15:29:04Z INFO filebeat start running.
2016-12-12T15:29:04Z INFO Registry file set to: /var/lib/filebeat/registry
2016-12-12T15:29:04Z INFO Loading registrar data from /var/lib/filebeat/registry
2016-12-12T15:29:04Z INFO States Loaded from registrar: 1
2016-12-12T15:29:04Z INFO Loading Prospectors: 1
2016-12-12T15:29:04Z DBG exclude_files: [.gz$]
2016-12-12T15:29:04Z INFO Load previous states from registry into memory
2016-12-12T15:29:04Z INFO Previous states loaded: 1
2016-12-12T15:29:04Z DBG File Configs: [/root/demo01.log*]
2016-12-12T15:29:04Z INFO Loading Prospectors completed. Number of prospectors: 1
2016-12-12T15:29:04Z INFO All prospectors are initialised and running with 1 states to persist
2016-12-12T15:29:04Z DBG Starting prospector 0
2016-12-12T15:29:04Z INFO Starting prospector of type: log
2016-12-12T15:29:04Z DBG Start next scan
2016-12-12T15:29:04Z DBG Check file for harvesting: /root/demo01.log
2016-12-12T15:29:04Z DBG Update existing file for harvesting: /root/demo01.log, offset: 5920234
2016-12-12T15:29:04Z DBG File didn't change: /root/demo01.log
2016-12-12T15:29:04Z DBG Prospector states cleaned up. Before: 1, After: 1


#8

2016-12-12T15:29:59Z DBG Flushing spooler because of timeout. Events flushed: 0
2016-12-12T15:30:04Z INFO No non-zero metrics in the last 30s
2016-12-12T15:30:04Z DBG Flushing spooler because of timeout. Events flushed: 0
2016-12-12T15:30:09Z DBG Flushing spooler because of timeout. Events flushed: 0
2016-12-12T15:30:10Z DBG Run prospector
2016-12-12T15:30:10Z DBG Start next scan
2016-12-12T15:30:10Z DBG Check file for harvesting: /root/demo01.log
2016-12-12T15:30:10Z DBG Update existing file for harvesting: /root/demo01.log, offset: 5920234
2016-12-12T15:30:10Z DBG File didn't change: /root/demo01.log
2016-12-12T15:30:10Z DBG Prospector states cleaned up. Before: 1, After: 1
2016-12-12T15:30:14Z DBG Flushing spooler because of timeout. Events flushed: 0
2016-12-12T15:30:19Z DBG Flushing spooler because of timeout. Events flushed: 0
2016-12-12T15:30:21Z DBG Run prospector
2016-12-12T15:30:21Z DBG Start next scan
2016-12-12T15:30:21Z DBG Check file for harvesting: /root/demo01.log
2016-12-12T15:30:21Z DBG Update existing file for harvesting: /root/demo01.log, offset: 5920234
2016-12-12T15:30:21Z DBG File didn't change: /root/demo01.log
2016-12-12T15:30:21Z DBG Prospector states cleaned up. Before: 1, After: 1
2016-12-12T15:30:24Z DBG Flushing spooler because of timeout. Events flushed: 0
2016-12-12T15:30:29Z DBG Flushing spooler because of timeout. Events flushed: 0
2016-12-12T15:30:32Z DBG Run prospector
2016-12-12T15:30:32Z DBG Start next scan
2016-12-12T15:30:32Z DBG Check file for harvesting: /root/demo01.log
2016-12-12T15:30:32Z DBG Update existing file for harvesting: /root/demo01.log, offset: 5920234
2016-12-12T15:30:32Z DBG File didn't change: /root/demo01.log
2016-12-12T15:30:32Z DBG Prospector states cleaned up. Before: 1, After: 1
2016-12-12T15:30:34Z INFO No non-zero metrics in the last 30s
2016-12-12T15:30:34Z DBG Flushing spooler because of timeout. Events flushed: 0
2016-12-12T15:30:39Z DBG Flushing spooler because of timeout. Events flushed: 0
2016-12-12T15:30:43Z DBG Run prospector
2016-12-12T15:30:43Z DBG Start next scan
2016-12-12T15:30:43Z DBG Check file for harvesting: /root/demo01.log
2016-12-12T15:30:43Z DBG Update existing file for harvesting: /root/demo01.log, offset: 5920234
2016-12-12T15:30:43Z DBG File didn't change: /root/demo01.log
2016-12-12T15:30:43Z DBG Prospector states cleaned up. Before: 1, After: 1
2016-12-12T15:30:54Z DBG Run prospector
2016-12-12T15:30:54Z DBG Start next scan
2016-12-12T15:30:54Z DBG Check file for harvesting: /root/demo01.log
2016-12-12T15:30:54Z DBG Update existing file for harvesting: /root/demo01.log, offset: 5920234
2016-12-12T15:30:54Z DBG File didn't change: /root/demo01.log
2016-12-12T15:30:54Z DBG Prospector states cleaned up. Before: 1, After: 1
2016-12-12T15:30:54Z DBG Flushing spooler because of timeout. Events flushed: 0
2016-12-12T15:30:59Z DBG Flushing spooler because of timeout. Events flushed: 0
2016-12-12T15:31:04Z INFO No non-zero metrics in the last 30s
2016-12-12T15:31:04Z DBG Flushing spooler because of timeout. Events flushed: 0
2016-12-12T15:31:05Z DBG Run prospector
2016-12-12T15:31:05Z DBG Start next scan
2016-12-12T15:31:05Z DBG Check file for harvesting: /root/demo01.log
2016-12-12T15:31:05Z DBG Update existing file for harvesting: /root/demo01.log, offset: 5920234
2016-12-12T15:31:05Z DBG File didn't change: /root/demo01.log
2016-12-12T15:31:05Z DBG Prospector states cleaned up. Before: 1, After: 1
2016-12-12T15:31:09Z DBG Flushing spooler because of timeout. Events flushed: 0
2016-12-12T15:31:14Z DBG Flushing spooler because of timeout. Events flushed: 0
2016-12-12T15:31:16Z DBG Run prospector
2016-12-12T15:31:16Z DBG Start next scan
2016-12-12T15:31:16Z DBG Check file for harvesting: /root/demo01.log-20161212
2016-12-12T15:31:16Z DBG Update existing file for harvesting: /root/demo01.log-20161212, offset: 5920234
2016-12-12T15:31:16Z DBG File rename was detected: /root/demo01.log -> /root/demo01.log-20161212, Current offset: 5920234
2016-12-12T15:31:16Z DBG Updating state for renamed file: /root/demo01.log -> /root/demo01.log-20161212, Current offset: 5920234
2016-12-12T15:31:16Z DBG File didn't change: /root/demo01.log-20161212
2016-12-12T15:31:16Z DBG Check file for harvesting: /root/demo01.log
2016-12-12T15:31:16Z DBG Start harvester for new file: /root/demo01.log
2016-12-12T15:31:16Z DBG Setting offset for file based on seek: /root/demo01.log
2016-12-12T15:31:16Z DBG Setting offset for file: /root/demo01.log. Offset: 0
2016-12-12T15:31:16Z DBG New state added for /root/demo01.log
2016-12-12T15:31:16Z DBG Prospector states cleaned up. Before: 2, After: 2
2016-12-12T15:31:16Z INFO Harvester started for file: /root/demo01.log


#9

Old file removed @2016/12/12 15:34:21

{...}
2016-12-12T15:34:19Z DBG Flushing spooler because of timeout. Events flushed: 0
2016-12-12T15:34:21Z INFO File is inactive: /root/demo01.log. Closing because close_inactive of 3m0s reached.
2016-12-12T15:34:21Z DBG Stopping harvester for file: /root/demo01.log
2016-12-12T15:34:21Z DBG Closing file: /root/demo01.log
2016-12-12T15:34:21Z DBG Update state: /root/demo01.log, offset: 0
2016-12-12T15:34:23Z DBG Run prospector
2016-12-12T15:34:23Z DBG Start next scan
2016-12-12T15:34:23Z DBG Check file for harvesting: /root/demo01.log
2016-12-12T15:34:23Z DBG Update existing file for harvesting: /root/demo01.log, offset: 0
2016-12-12T15:34:23Z DBG File didn't change: /root/demo01.log
2016-12-12T15:34:23Z DBG Check file for harvesting: /root/demo01.log-20161212
2016-12-12T15:34:23Z DBG Ignore file because ignore_older reached: /root/demo01.log-20161212
2016-12-12T15:34:23Z DBG Prospector states cleaned up. Before: 2, After: 2
2016-12-12T15:34:24Z DBG Flushing spooler because of timeout. Events flushed: 1
2016-12-12T15:34:24Z DBG No events to publish
2016-12-12T15:34:24Z DBG Events sent: 1
2016-12-12T15:34:24Z DBG Processing 1 events
2016-12-12T15:34:24Z DBG Registrar states cleaned up. Before: 2 , After: 2
2016-12-12T15:34:24Z DBG Write registry file: /var/lib/filebeat/registry
2016-12-12T15:34:24Z DBG Registry file updated. 2 states written.
2016-12-12T15:34:29Z DBG Flushing spooler because of timeout. Events flushed: 0
2016-12-12T15:34:34Z INFO Non-zero metrics in the last 30s: publish.events=1 filebeat.harvester.running=-1 registrar.writes=1 filebeat.harvester.closed=1 registrar.states.update=1 filebeat.harvester.open_files=-1
2016-12-12T15:34:34Z DBG Run prospector
2016-12-12T15:34:34Z DBG Start next scan
2016-12-12T15:34:34Z DBG Check file for harvesting: /root/demo01.log
2016-12-12T15:34:34Z DBG Update existing file for harvesting: /root/demo01.log, offset: 0
2016-12-12T15:34:34Z DBG File didn't change: /root/demo01.log
2016-12-12T15:34:34Z DBG Prospector states cleaned up. Before: 2, After: 2
2016-12-12T15:34:34Z DBG Remove state for file as file removed: /root/demo01.log-20161212
2016-12-12T15:34:34Z DBG Flushing spooler because of timeout. Events flushed: 1
2016-12-12T15:34:34Z DBG No events to publish
2016-12-12T15:34:34Z DBG Events sent: 1
2016-12-12T15:34:34Z DBG Processing 1 events

When I turn on debug - the registry is updated,
I tried on a new machine and the problem repeated there, I am in the process of turning on the debug on the new machine to see if the registry will be updated to reflect deletion.


#10

UPDATE: With a certainty I can tell now that in our case when the debug is on, the registry file is behaving as expected. I verify this on 3 separate machines that are running filebeat in our environments.


(ruflin) #11

For me to understand: If you set debug to false, the registry is not updated, if you set it to true, the registry is updated?

Please post long log files into a gist instead of using multiple post and link the gist here.


#12

When filebeat.yml is like this:

logging.level: debug
logging.to_files: true
logging.to_syslog: false
logging.files:
  path: /var/log/mybeat
  name: mybeat.log
  keepfiles: 7

The registry is cleared, which is also reflected in the log

2016-12-12T15:34:34Z DBG Remove state for file as file removed: /root/demo01.log-20161212

and registry itself

[{"source":"/root/demo01.log","offset":0,"FileStateOS":{"inode":9284452,"device":51713},"timestamp":"2016-12-12T15:34:24.045962771Z","ttl":420000000000}]

When the logging.level is set to INFO, the registry is stuck with two entries. I tested it on three diffrent machines


(ruflin) #13

I'm trying to make sense of this. It's strange because logging and file rotation are not related at all (or shouldn't be). Could it be a timing issue? Means if you have debug enabled, it is a little bit slower and to process events it takes longer so the registry gets updated when the even already expired and it is cleaned up?

Can you try the following: Use INFO logging and do exactly what you did before. But then after the registry file is not updated, add a new file which is part of a prospector and not related to the other files. This should trigger a state update and clean up your states.

Also on shutdown states are cleaned up / rewritten. If you use info and then stop filebeat, do you get the correct registry?


#14

Ok, I did what you asked

log level on INFO

2017-01-18T12:44:41Z INFO Harvester started for file: root/demo01.log-anna
2017-01-18T12:44:41Z ERR Failed to publish events caused by: EOF
2017-01-18T12:44:41Z INFO Error publishing events (retrying): EOF
2017-01-18T12:45:06Z INFO Non-zero metrics in the last 30s: libbeat.publisher.published_events=11 libbeat.logstash.published_and_acked_events=11 libbeat.logstash.publish.write_bytes=3757 registar.states.current=1 registrar.writes=1 registrar.states.update=13 libbeat.logstash.publish.read_errors=1 filebeat.harvester.closed=1 publish.events=13 libbeat.logstash.call_count.PublishEvents=2 libbeat.logstash.publish.read_bytes=12 filebeat.harvester.started=1 libbeat.logstash.published_but_not_acked_events=11
2017-01-18T12:45:36Z INFO Non-zero metrics in the last 30s: publish.events=1 registrar.states.cleanup=1 registrar.writes=1 registrar.states.update=1 registar.states.current=-1
2017-01-18T12:46:06Z INFO No non-zero metrics in the last 30s

I had to create two files before the registry was cleared as it suppose to be, as linux is recycling inodes. So created demo01.log-test (inode recycled from removed rotated demo01.log-20170118) which was not removed from registry, but the I created demo01.log-anna (new inode) which was added to registry and then deleted demo01.log-test, registry was updated and now consists only of two files

[{"source":"/root/demo01.log","offset":0,"FileStateOS":{"inode":585191,"device":51713},"timestamp":"2017-01-18T12:44:42.430188205Z","ttl":420000000000},{"source":"/root/demo01.log-anna","offset":5325,"FileStateOS":{"inode":585201,"device":51713},"timestamp":"2017-01-18T12:47:46.426295634Z","ttl":420000000000}]

then removed demo01.log-anna, and registry was empty (expected as prospector for demo01.log was closed due to inactivity) then copy new lines to demo01.log. Registry looks good now

[{"source":"/root/demo01.log","offset":5325,"FileStateOS":{"inode":585191,"device":51713},"timestamp":"2017-01-18T12:55:02.477080955Z","ttl":420000000000}]

So is that proving your theory about time? How should I proceed to get the end result in the first rotation? Maybe it is a problem with Linux recycling inodes? The problem with rotation is that registry is recording two files with the same name demo01.log not one file demo01.log and demo01.log-20170118 (it was doing that before, so do not know what it is causing it)


(ruflin) #15

Unfortunately in the above we had the following influencing your tests:

It seems it was not possible to publish some events, so also the state will not be cleaned up. Do you see this error often in your log?

For the above: It is not not a problem having two files with the same name as long as the inodes are different. So I'm not sure if the above is really a problem for your use case. The registry is only relevant after a restart. Each prospectors keeps a separate list of states in memory which is updated after each scan (scan_frequency).


(system) #16

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.