Hello,
we have now for the second time had an issue of filebeat not reacting well to an uncontrolled reboot on production. It causes a large iowait on the server and lags considerably sending logs to the elasticsearch backend.
After the reboot we see in the logs mention of data corruption in filebeat registry:
WARN memlog/store.go:130 Incomplete or corrupted log file in /var/lib/filebeat/registry/filebeat. Continue with last known complete and consistent state. Reason: invalid character '\x00' looking for beginning of value
Attempting to stop the service, it doesn't respond and is eventually killed with SIGKILL by systemd.
Is this a known issue? Is there anything we can do? Is there any command we can run, should the issue occur again, to assist with the analysis of the root cause in the future?
Thank you,
Emmanuel