Filebeat causing a very large iowait and lagging after uncontrolled reboot

emmanuel_t · January 29, 2024, 10:53am

Hello,

we have now for the second time had an issue of filebeat not reacting well to an uncontrolled reboot on production. It causes a large iowait on the server and lags considerably sending logs to the elasticsearch backend.

After the reboot we see in the logs mention of data corruption in filebeat registry:

WARN memlog/store.go:130 Incomplete or corrupted log file in /var/lib/filebeat/registry/filebeat. Continue with last known complete and consistent state. Reason: invalid character '\x00' looking for beginning of value

Attempting to stop the service, it doesn't respond and is eventually killed with SIGKILL by systemd.

Is this a known issue? Is there anything we can do? Is there any command we can run, should the issue occur again, to assist with the analysis of the root cause in the future?

Thank you,

Emmanuel

emmanuel_t · February 12, 2024, 9:12am

found a bug about that issue: High io consumption after sudden filebeat stop · Issue #35893 · elastic/beats · GitHub

Unfortunately the issue hasn't been getting any attention.

system · March 11, 2024, 11:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat iowaits issue? Beats filebeat	16	1442	April 15, 2021
[Filebeat] K8s OOM kill causes file corruption Beats filebeat	2	975	November 18, 2020
Filbeat process fails after SAN controller restart Beats filebeat	2	333	April 19, 2018
Strange Filebeat alerts processing Beats filebeat	17	423	November 19, 2020
Filebeat loops while corrupted logs processing Beats filebeat	5	1155	July 12, 2019

Filebeat causing a very large iowait and lagging after uncontrolled reboot

Related topics