Filebeat flooding logstash

tgdesrochers · May 14, 2018, 11:42pm

Filebeat and logstash 6.2

I have 18k+ devices with filebeat sending to logstash. My logstash is in AWS and is fronted with a classic ELB. I terminate ssl on the ELB and forward to the correct part for logstash.

I had an incident where my logstash nodes were unhealthy and the port became unavailable. I have 8 logstash nodes and eventually they all became unavailable because they were being overworked. This forced OOM errors and made it so the nodes couldn't process any.ore. This isse is that the filebeat agents were still trying to connect.

On the client side we say filebeat connect to the ELB and ssl was terminated but logstash closed the connection. Immediately after filebeat tried again. This happened many times.es per second per host. And with 18k hosts doing this it utilized a lot of network bandwidth.

How can I come figure filebeat to exponentially backoff when it's connection is gracefully closed but it happens many times per second.

I understand that filebeat tries to reopen the connection so it can optimize latency but the issue here is when logstash was closing g the connections because it couldn't handle the load the agents DDoS'd logstash.

Are there any best practices that can be used to ratelimit filebeat? For this type of scenario what could be done? If I were to move the ssl handoff to logstash instead of the ELB would that change the behavior of filebeat?

Thank you. Trying to take lessons learned from this incident to build a more reliable and resilient pipeline

pierhugues · May 15, 2018, 1:18pm

@tgdesrochers Filebeat is using an exponential backoff when an error occur to reconnect, I think the problem in that case is each 18K is retrying to send a full batch and this is causing a OOM. I think configuring filebeat to do a slow_start would help if a cascading error happen.

We have plan to refactor the exponential backoff strategy, I will check if we can expose some knobs to the users to allow you to tweak the behavior.

system · June 12, 2018, 1:18pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash connections are going to close_wait intermittently Logstash	1	995	December 26, 2017
Beats / Filebeat / Topbeat / packetbeat / SSL connections Beats	6	1147	October 11, 2016
Filebeat connection error with logstash Beats filebeat	2	1513	April 5, 2023
Filebeat stop to send logs to Elastic after cluster_block_exception occurs Beats filebeat	1	338	December 11, 2020
Filebeat connections stuck in CLOSE_WAIT Logstash	8	2270	February 21, 2018

Filebeat flooding logstash

Related topics