Filebeat Metrics Spiking and Returning to Normal Despite Ongoing Network Block to Logstash

germain_nganko · July 29, 2024, 7:58am

I am experiencing an issue with Filebeat in our Kubernetes cluster and would like to know if others have encountered a similar situation or have any insights into this behavior.

Issue: I have observed that when there is a network block to Logstash, the Filebeat metrics spike as expected. For example, the filebeat_libbeat_output{events="failed"} metric shows a spike, indicating failed events. However, after approximately 10 minutes, the metrics return to normal levels, even though the network block is still in place and the issue remains unresolved.

Expectation: I would expect the metrics to continue indicating an issue (i.e., remain at elevated levels) until the network block is removed and Filebeat can successfully send data to Logstash again.

Questions:

Is this behavior expected from Filebeat's retry and backoff mechanism?
Have others experienced similar issues with Filebeat metrics in a Kubernetes environment?
If this behavior is expected, how do you recommend setting up alerts on these metrics to ensure continuous monitoring of such issues?

Any guidance or suggestions would be greatly appreciated.

Thank you!

Topic		Replies	Views
Filebeat incorrect metrics Beats filebeat	3	482	April 9, 2019
Filebeat stops sending logs, but with no errors, service is running Beats filebeat	2	1445	May 18, 2017
Filebeat blocked after bulk file insertion Beats filebeat	4	374	November 2, 2022
Filebeat performance stall sometimes Beats filebeat	17	2399	May 1, 2020
How filebeat calculate metrics returned by HTTP endpoint Beats filebeat	2	321	April 19, 2019

Filebeat Metrics Spiking and Returning to Normal Despite Ongoing Network Block to Logstash

Related topics