Filebeat 1.1.0: Log Collection Halting

dawiro · April 12, 2016, 2:45pm

Hi,
I've started log collection for a cassandra cluster and am seeing some odd behaviour.
That is, when I start the service in debug mode I see the log lines being processed and published to the remote logstash endpoint. However, after a period of time filebeat will start to claim the the log file hasn't changed when then are clearly lines waiting to be processed. The only way to deal with this is to restart the service.

Config looks like this:

filebeat:
  prospectors:
  registry_file: /var/lib/filebeat/registry
  config_dir: /etc/filebeat/conf.d
output:
  logstash:
    hosts: ["logstash_host:5044"]
  console:
    pretty: True
shipper:
logging:
  to_files: True
  files:
    path: /var/log/filebeat
    name: filebeat.log
    rotateeverybytes: 104857600
    keepfiles: 5
  level: debug

Regards.
David

andrewkroh · April 12, 2016, 3:56pm

Hi @dawiro, there was a critical bug in 1.1.0 that was fixed in 1.1.1. Maybe it's the problem you are seeing. Check out https://github.com/elastic/beats/pull/944.

You should update.

dawiro · April 13, 2016, 10:14am

I'll update to 1.2.1 and see how that goes....

dawiro · April 13, 2016, 1:36pm

I've updated to 1.2.1 and am still seeing the same problem though, initially, it seems to run longer before needing to restart.

andrewkroh · April 13, 2016, 2:05pm

Can you share the contents of /etc/filebeat/conf.d/ and your Filebeat log file with debug enabled? You can use http://pastebin.com if it won't all fit into a forum post.

dawiro · April 13, 2016, 2:48pm

Prospector config is here:

http://pastebin.com/5jGTNYny

Log file extract is here:

http://pastebin.com/niaSrCRS

andrewkroh · April 13, 2016, 3:02pm

Do you have the beginning of the log file? There are few things logged at startup that I would like to check.

And can you double check the installed version: filebeat -version

Thanks

dawiro · April 13, 2016, 3:14pm

Have a look at this:

http://pastebin.com/gQJWiycS

swturner · April 14, 2016, 3:31am

I'm having the same problem, running filebeat 1.1.1. In my case, the logs are being rolled hourly. Occasionally, after a rollover filebeat seems to get "stuck" and the log output fills with lines like:
2016/04/13 16:00:00.552320 log.go:157: INFO Read line error: file inactive
2016/04/13 16:00:00.746876 log.go:157: INFO Read line error: file inactive
2016/04/13 16:00:02.955370 log.go:157: INFO Read line error: file inactive
2016/04/13 16:00:03.071145 log.go:157: INFO Read line error: file inactive

This despite the fact that the logs are being written. Then, after hours of this, we'll get a burst of where it seems to try and restart, with logs like:
2016/04/13 20:07:28.604003 log.go:157: INFO Read line error: file inactive
2016/04/13 20:49:36.432179 log.go:157: INFO Read line error: file inactive
2016/04/13 20:49:56.540290 log.go:113: INFO Harvester started for file: /storage/bro/logs/current/dns.log
2016/04/13 20:49:56.540288 log.go:113: INFO Harvester started for file: /storage/bro/logs/current/weird.log
2016/04/13 20:49:56.540288 log.go:113: INFO Harvester started for file: /storage/bro/logs/current/known_services.log

But despite those "Harvester started" messages repeating at odd intervals, sometimes intermixed with the "file inactive" messages, no data flows. Finally, after restarting, this is what a startup log looks like:

http://pastebin.com/MUr4m7XC

And with that, we proceed to "normal" operation:

http://pastebin.com/HwYYa4z4

dawiro · April 14, 2016, 7:05am

I don't see this after a roll over. But it can happen after disturbingly short periods of time...

dawiro · April 14, 2016, 7:19am

I've had another look at this. I see this in the log immediately after the last event gets published:

2016-04-13T18:01:56+01:00 DBG output worker: publish 76 events
2016-04-13T18:01:56+01:00 DBG output worker: publish 76 events
2016-04-13T18:01:56+01:00 DBG Try to publish 76 events to logstash with window size 15
2016-04-13T18:01:56+01:00 ERR Fail to convert the event to JSON: reflect: call of reflect.Value.IsNil on zero Value
2016-04-13T18:01:56+01:00 INFO Error bulk publishing events: reflect: call of reflect.Value.IsNil on zero Value
2016-04-13T18:01:56+01:00 DBG 15 events out of 76 events sent to logstash. Continue sending ...
2016-04-13T18:01:56+01:00 DBG Try to publish 61 events to logstash with window size 15
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG End of file reached: /var/log/cassandra/system.log; Backoff now.
2016-04-13T18:01:56+01:00 DBG 15 events out of 61 events sent to logstash. Continue sending ...
2016-04-13T18:01:56+01:00 DBG Try to publish 46 events to logstash with window size 15
2016-04-13T18:01:56+01:00 DBG 15 events out of 46 events sent to logstash. Continue sending ...
2016-04-13T18:01:56+01:00 DBG Try to publish 31 events to logstash with window size 15
2016-04-13T18:01:56+01:00 DBG 15 events out of 31 events sent to logstash. Continue sending ...
2016-04-13T18:01:56+01:00 DBG Try to publish 16 events to logstash with window size 15
2016-04-13T18:01:56+01:00 DBG 15 events out of 16 events sent to logstash. Continue sending ...
2016-04-13T18:01:56+01:00 DBG Try to publish 1 events to logstash with window size 15
2016-04-13T18:01:56+01:00 DBG 1 events out of 1 events sent to logstash. Continue sending ...
2016-04-13T18:01:56+01:00 DBG send completed
2016-04-13T18:01:57+01:00 DBG Start next scan
2016-04-13T18:01:57+01:00 DBG scan path /var/log/cassandra/*.log
2016-04-13T18:01:57+01:00 DBG Check file for harvesting: /var/log/cassandra/output.log
2016-04-13T18:01:57+01:00 DBG Update existing file for harvesting: /var/log/cassandra/output.log
2016-04-13T18:01:57+01:00 DBG Not harvesting, file didn't change: /var/log/cassandra/output.log
2016-04-13T18:01:57+01:00 DBG Check file for harvesting: /var/log/cassandra/system.log
2016-04-13T18:01:57+01:00 DBG Update existing file for harvesting: /var/log/cassandra/system.log
2016-04-13T18:01:57+01:00 DBG Not harvesting, file didn't change: /var/log/cassandra/system.log

...and then nothing is published until the service is restarted. From scanning publish events earlier in the log it seems that the error "Fail to convert the event to JSON" breaks log reading for that file...

steffens · April 14, 2016, 11:51am

2016-04-13T18:01:56+01:00 ERR Fail to convert the event to JSON: reflect: call of reflect.Value.IsNil on zero Value

Not sure if this is a bug in the standard library. Nightly builds do add the event to log output. Maybe this helps figuring out why json encoder trips.

dawiro · April 14, 2016, 1:37pm

I'll try a nightly build on one server and see what that shows me...

dawiro · April 14, 2016, 1:53pm

I've installed "filebeat_5.0.0-nightlylatest_amd64.deb". It ran fine for 30 minutes and then died silently...

dawiro · April 14, 2016, 2:26pm

From what I see in the log it reaches the point where the JSON error occurs but instead of reporting an error, it dies...

dawiro · April 15, 2016, 6:23am

So given that the nightly build fails, what happens now?

steffens · April 15, 2016, 1:53pm

this is super weird. I'd expect at least a stack-trace being printed if converting event to string failed.

I will need some sample logs and some shell script simulating the logrotation strategy implemented trying to reproduce issue. I haven't seen this problem before.

I didn't find an issue matching your case. Can you please open an issue with all details you have (logs, configs, versions)? Thanks.

dawiro · April 18, 2016, 9:27am

Ok, issue raised: https://github.com/elastic/beats/issues/1410

Topic		Replies	Views
Filebeat stopped working Beats filebeat	6	4244	July 5, 2017
Filebeat missed one complete log file in log rotation Beats filebeat	10	1511	May 17, 2017
Filebeat stops harvesting (error file didn't change) but tail -f is ok! Beats filebeat	1	1457	October 12, 2016
Filebeat upgrade requiring multiple restarts Beats filebeat	10	1505	February 7, 2017
Filebeat missing files Beats filebeat	19	5689	October 11, 2016

Filebeat 1.1.0: Log Collection Halting

Related topics