Filebeat Fails after Power Failure


(Andrew) #1

Hey All, very weird issue. Hoping for some guidance/assistance here.

Issue: Filebeat running on CentOS Linux release 7.3.1611 (Core), filebeat 5.1. Server suffers power failure due to host failing. Ungraceful shutdown, VM running off remote QNAP via ISCSI. VM recovers fine but filebeat doesn't start - error message below;

2017-02-05T19:21:25+13:00 INFO Home path: [/usr/share/filebeat] Config path: [/etc/filebeat] Data path: [/var/lib/filebeat] Logs path: [/var/log/filebeat]
2017-02-05T19:21:25+13:00 INFO Setup Beat: filebeat; Version: 5.1.1
2017-02-05T19:21:25+13:00 INFO Max Retries set to: 3
2017-02-05T19:21:25+13:00 INFO Activated logstash as output plugin.
2017-02-05T19:21:25+13:00 INFO Publisher name: mon.testdomain.com
2017-02-05T19:21:25+13:00 INFO Flush Interval set to: 1s
2017-02-05T19:21:25+13:00 INFO Max Bulk Size set to: 2048
2017-02-05T19:21:25+13:00 INFO filebeat start running.
2017-02-05T19:21:25+13:00 INFO Registry file set to: /var/lib/filebeat/registry
2017-02-05T19:21:25+13:00 INFO Loading registrar data from /var/lib/filebeat/registry
2017-02-05T19:21:25+13:00 INFO Total non-zero values:
2017-02-05T19:21:25+13:00 INFO Uptime: 9.9703ms
2017-02-05T19:21:25+13:00 INFO filebeat stopped.
2017-02-05T19:21:25+13:00 CRIT Exiting: Could not start registrar: Error loading state: Error decoding states: EOF

/var/lib/filebeat/registry is empty after the power failure and filebeat does not recover.

systemctl start filebeat yields the same error.

I'm assuming that filebeat isn't recovering because the registry file is empty and hence the EOF error?

Question: How do you re-mediate this issue and ensure that the registry file stays intact in the event of an ungraceful failure? Is this a bug in the filebeat agent? Worthy of raising a ticket on github? Both servers that have filebeat enabled experienced the same failure and none of the agents start at boot or after manual start.

Thoughts appreciated, no idea where to look next.

Cheers
Andy


(ruflin) #2

I have seen a similar case in the past with virtual machines but so far it is unclear to me what exactly happens and how it is possible that the registry file can be empty. Can you open a Github issue for that?

Please share there the following:

  • File system you are using
  • Virtual machine types
  • Log file of filebeat before crashing

The log file before crashing could potentially show if something went wrong during writing the registry.

I would expect it to be an edge case that the registry is empty after a crash as it would have to crash directly during writing. But it seems it happens on both of your machines and is kind of reproducible?


(Andrew) #3

Thanks Ruflin, migrated this issue to Github.

The log file has been attached to the Github issue, it doesn't yield anything prior to the failure but logging was set to default so unsure if this will prevent you from getting the detail you need to troubleshoot it. Please let me know if there is any additional information I can acquire that may assist you.
I haven't been to "reproduce" the issue conventionally, just an observation that both Centos 7 VMs experienced the same failure in the latest power outage. As the QNAP ISCSI connection fails when the host goes off line, this may complicate the issue further. Keeping that in the back of my mind.


(ruflin) #4

Thanks for filing the issue. I will continue the discussion there.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.