This is the 2nd time this has happened now. I have a prospector setup to read /var/log/myapp/*.log and it brings up about 5 or 6 harvesters as intended. Things run smooth for a while but what seems to be randomly it just stops harvesting one of the files. The file is continually being written to and rotates the same time as all the others. The other prospectors are running just fine and if I restart filebeat it picks the missing file backup and starts shipping again. I don't see any errors in the mybeat logs either.
Here's some log files that might help. Notice the timestamp. After this no info for that particular log is generated (starting harvester, stopping, etc)
2015-12-17T07:17:55Z ERR File reading error. Stopping harvester. Error: Stop harvesting as file is older then ignore_older: /var/log/myapp/mylog.log; Last change was: 10m5.015760004s
2015-12-17T07:53:45Z ERR File reading error. Stopping harvester. Error: Stop harvesting as file is older then ignore_older: /var/log/myapp/mylog.log; Last change was: 10m5.016134511s
/var/log/myapp/mylog.log: Again looking at the timestamps it should be picking this up. I redacted everything except the timestamp and loglevel just to show that the file is being written to.
2015-12-17 07:48:57.731427 [INFO]
2015-12-17 07:56:17.676947 [INFO]
2015-12-17 07:56:17.730149 [INFO]
2015-12-17 07:56:17.733369 [INFO]
2015-12-17 07:56:17.734734 [INFO]
2015-12-17 07:56:17.734842 [INFO]
2015-12-17 07:56:17.739996 [INFO]
Any ideas on what else I could look at? We are very close to pushing filebeat into production but this 'unknown' is holding it back. Thanks.
which filebeat version are you using.
problem is the ignore_older option. We set it to 24h by default. Don't know all in details, but this discussion thread (from message onward) seems to be related.
Hi we are currently using 1.0.0. I just updated to 1.0.1 though after the problem appeared. We have explicitly set the ignore_older option to 10m.
Even if the file is older then 10 min, once it's written to again won't the scan pick that up and start the harvester back up?
From this comment it seems ignore_older should be bigger the maximum update period, so files are not closed too soon and still picked up.
Can you try to run filebeat with -e -d 'prospector' to log some debug message (and other logs) to stderr?
After closing a file due to ignore_older prospector might generate different debug messages for this file:
- (always if last now - mod_time > ignore_older) Fetching old state of file to resume
- Resuming harvester on a previously harvested file
- Skipping file ...
- (if now - mod_time < ignore_older and file renamed) File rename was detected
- (if now - mod_time < ignore_older and no file rename)
- (if file is already known) Resuming harvester on a previously harvested file
- (if new file) Launching harvester on new file
The timestamps on your log-file match the actual log event timestamp. Filebeat uses the file modification time. If you know a file its name being affected in advance you can try this line of bash to get unique mod-time updates at about every second:
$ logfile=...; while true; do sleep 1; ls -l --full-time $logfile; done | uniq
QQ regarding ignore_older and logrotate. If I increase ignore_older to something like 1hr and the log file in question gets logrotated, will the prospector pick up the new file right away and start shipping or will it have to wait 1hr for the rotated one to close before picking up the new file?
the rotated file will be picked up I think. But still, the old one will be tried to read until ignore_older. You can try force_close_files option to close the fd before hitting ignore_older at the cost of missing lines not yet read when file rotation happens.
The new file will be picked up directly, latest after