Filebeat 1.1.0: Log Collection Halting


#1

Hi,
I've started log collection for a cassandra cluster and am seeing some odd behaviour.
That is, when I start the service in debug mode I see the log lines being processed and published to the remote logstash endpoint. However, after a period of time filebeat will start to claim the the log file hasn't changed when then are clearly lines waiting to be processed. The only way to deal with this is to restart the service.

Config looks like this:

filebeat:
  prospectors:
  registry_file: /var/lib/filebeat/registry
  config_dir: /etc/filebeat/conf.d
output:
  logstash:
    hosts: ["logstash_host:5044"]
  console:
    pretty: True
shipper:
logging:
  to_files: True
  files:
    path: /var/log/filebeat
    name: filebeat.log
    rotateeverybytes: 104857600
    keepfiles: 5
  level: debug

Regards.
David


(Andrew Kroh) #2

Hi @dawiro, there was a critical bug in 1.1.0 that was fixed in 1.1.1. Maybe it's the problem you are seeing. Check out https://github.com/elastic/beats/pull/944.

You should update.


#3

I'll update to 1.2.1 and see how that goes....


#4

I've updated to 1.2.1 and am still seeing the same problem though, initially, it seems to run longer before needing to restart.


(Andrew Kroh) #5

Can you share the contents of /etc/filebeat/conf.d/ and your Filebeat log file with debug enabled? You can use http://pastebin.com if it won't all fit into a forum post.


#6

Prospector config is here:

http://pastebin.com/5jGTNYny

Log file extract is here:

http://pastebin.com/niaSrCRS


(Andrew Kroh) #7

Do you have the beginning of the log file? There are few things logged at startup that I would like to check.

And can you double check the installed version: filebeat -version

Thanks


#8

Have a look at this:

http://pastebin.com/gQJWiycS


(Steve Turner) #9

I'm having the same problem, running filebeat 1.1.1. In my case, the logs are being rolled hourly. Occasionally, after a rollover filebeat seems to get "stuck" and the log output fills with lines like:
2016/04/13 16:00:00.552320 log.go:157: INFO Read line error: file inactive
2016/04/13 16:00:00.746876 log.go:157: INFO Read line error: file inactive
2016/04/13 16:00:02.955370 log.go:157: INFO Read line error: file inactive
2016/04/13 16:00:03.071145 log.go:157: INFO Read line error: file inactive

This despite the fact that the logs are being written. Then, after hours of this, we'll get a burst of where it seems to try and restart, with logs like:
2016/04/13 20:07:28.604003 log.go:157: INFO Read line error: file inactive
2016/04/13 20:49:36.432179 log.go:157: INFO Read line error: file inactive
2016/04/13 20:49:56.540290 log.go:113: INFO Harvester started for file: /storage/bro/logs/current/dns.log
2016/04/13 20:49:56.540288 log.go:113: INFO Harvester started for file: /storage/bro/logs/current/weird.log
2016/04/13 20:49:56.540288 log.go:113: INFO Harvester started for file: /storage/bro/logs/current/known_services.log

But despite those "Harvester started" messages repeating at odd intervals, sometimes intermixed with the "file inactive" messages, no data flows. Finally, after restarting, this is what a startup log looks like:

http://pastebin.com/MUr4m7XC

And with that, we proceed to "normal" operation:

http://pastebin.com/HwYYa4z4


#10

I don't see this after a roll over. But it can happen after disturbingly short periods of time...


#11

I've had another look at this. I see this in the log immediately after the last event gets published:

2016-04-13T18:01:56+01:00 DBG output worker: publish 76 events
2016-04-13T18:01:56+01:00 DBG output worker: publish 76 events
2016-04-13T18:01:56+01:00 DBG Try to publish 76 events to logstash with window size 15
2016-04-13T18:01:56+01:00 ERR Fail to convert the event to JSON: reflect: call of reflect.Value.IsNil on zero Value
2016-04-13T18:01:56+01:00 INFO Error bulk publishing events: reflect: call of reflect.Value.IsNil on zero Value
2016-04-13T18:01:56+01:00 DBG 15 events out of 76 events sent to logstash. Continue sending ...
2016-04-13T18:01:56+01:00 DBG Try to publish 61 events to logstash with window size 15
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG full line read
2016-04-13T18:01:56+01:00 DBG End of file reached: /var/log/cassandra/system.log; Backoff now.
2016-04-13T18:01:56+01:00 DBG 15 events out of 61 events sent to logstash. Continue sending ...
2016-04-13T18:01:56+01:00 DBG Try to publish 46 events to logstash with window size 15
2016-04-13T18:01:56+01:00 DBG 15 events out of 46 events sent to logstash. Continue sending ...
2016-04-13T18:01:56+01:00 DBG Try to publish 31 events to logstash with window size 15
2016-04-13T18:01:56+01:00 DBG 15 events out of 31 events sent to logstash. Continue sending ...
2016-04-13T18:01:56+01:00 DBG Try to publish 16 events to logstash with window size 15
2016-04-13T18:01:56+01:00 DBG 15 events out of 16 events sent to logstash. Continue sending ...
2016-04-13T18:01:56+01:00 DBG Try to publish 1 events to logstash with window size 15
2016-04-13T18:01:56+01:00 DBG 1 events out of 1 events sent to logstash. Continue sending ...
2016-04-13T18:01:56+01:00 DBG send completed
2016-04-13T18:01:57+01:00 DBG Start next scan
2016-04-13T18:01:57+01:00 DBG scan path /var/log/cassandra/*.log
2016-04-13T18:01:57+01:00 DBG Check file for harvesting: /var/log/cassandra/output.log
2016-04-13T18:01:57+01:00 DBG Update existing file for harvesting: /var/log/cassandra/output.log
2016-04-13T18:01:57+01:00 DBG Not harvesting, file didn't change: /var/log/cassandra/output.log
2016-04-13T18:01:57+01:00 DBG Check file for harvesting: /var/log/cassandra/system.log
2016-04-13T18:01:57+01:00 DBG Update existing file for harvesting: /var/log/cassandra/system.log
2016-04-13T18:01:57+01:00 DBG Not harvesting, file didn't change: /var/log/cassandra/system.log

...and then nothing is published until the service is restarted. From scanning publish events earlier in the log it seems that the error "Fail to convert the event to JSON" breaks log reading for that file...


(Steffen Siering) #12
2016-04-13T18:01:56+01:00 ERR Fail to convert the event to JSON: reflect: call of reflect.Value.IsNil on zero Value

Not sure if this is a bug in the standard library. Nightly builds do add the event to log output. Maybe this helps figuring out why json encoder trips.


#13

I'll try a nightly build on one server and see what that shows me...


#14

I've installed "filebeat_5.0.0-nightlylatest_amd64.deb". It ran fine for 30 minutes and then died silently...


#15

From what I see in the log it reaches the point where the JSON error occurs but instead of reporting an error, it dies...


#16

So given that the nightly build fails, what happens now?


(Steffen Siering) #17

this is super weird. I'd expect at least a stack-trace being printed if converting event to string failed.

I will need some sample logs and some shell script simulating the logrotation strategy implemented trying to reproduce issue. I haven't seen this problem before.

I didn't find an issue matching your case. Can you please open an issue with all details you have (logs, configs, versions)? Thanks.


#18

Ok, issue raised: https://github.com/elastic/beats/issues/1410


(system) #19