Filebeat causing a very large iowait and lagging after uncontrolled reboot

Hello,

we have now for the second time had an issue of filebeat not reacting well to an uncontrolled reboot on production. It causes a large iowait on the server and lags considerably sending logs to the elasticsearch backend.

After the reboot we see in the logs mention of data corruption in filebeat registry:

WARN memlog/store.go:130 Incomplete or corrupted log file in /var/lib/filebeat/registry/filebeat. Continue with last known complete and consistent state. Reason: invalid character '\x00' looking for beginning of value

Attempting to stop the service, it doesn't respond and is eventually killed with SIGKILL by systemd.

Is this a known issue? Is there anything we can do? Is there any command we can run, should the issue occur again, to assist with the analysis of the root cause in the future?

Thank you,

Emmanuel

found a bug about that issue: High io consumption after sudden filebeat stop · Issue #35893 · elastic/beats · GitHub

Unfortunately the issue hasn't been getting any attention.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.