Recovery mechanism in filebeat

steffens · April 21, 2016, 1:17am

Yes. Without ACK filebeat can not tell if logstash has received and processed events. That is, it has to send events once again. logstash is doing no deduplication. This must be solved either on protocol level (but will still be tricky in case presence of load-balancing) or via event deduplication (e.g. by having logstash generate event id).

One potential solution to implement deduplication via logstash+elasticsearch I mentioned here: Detect filebeat retries to remove duplicates in the server side

The same 'problem' exists with other protocols/outputs (kafka, redis) as well, as there is no support for dealing with old resends. On the other hand, advantage with send-at-least-once semantics is reduced bookkeeping, especially in the presence of load balancing.

Often-case (given network is not unstable for much too long), it's good enough. But be adviced to monitor you systems and if things get wonky stop data ingestion (e.g. turn kafka off). For example kafka being notorious for storing everything based on retention times not taking disk space usage into account (well, to be fair, behaviour is configurable) until system breaks in bad ways. This is a general problem (design decision) in some systems. So you either drop events (not possible with filebeat) or stop data ingestion in presence of systems getting unresponsive and overloaded (taking X-times disk/CPU then normal).

Trying to solve deduplication on protocol level would require e.g. sequence number to detect resends on server side (e.g. as done by TCP to detect duplicate segments) + have consensus among servers in presence of load-balancing in order to detect resends being forwarded to another node (failover handling by client). For dealing with client/server restarts you might want to log sequence numbers to disks in addition. And there goes scalability.

Topic		Replies	Views
Detect filebeat retries to remove duplicates in the server side Beats filebeat	3	1915	July 5, 2017
How does Filebeat manage problems connecting to Logstash? Beats filebeat	2	1397	July 5, 2017
Logstash not sending ACK to Filebeat thereby causing duplicate events Logstash	3	1114	March 13, 2020
Duplicated events using Filebeat Beats filebeat	14	4237	July 6, 2017
Filebeat lost data Beats filebeat	13	3480	August 28, 2017

Recovery mechanism in filebeat

Related topics