Increasing resiliency of an on-prem Elastic stack implementation

Hi!

We have an on-prem setup running 8.15.1

We use this to index/visualize logs coming off of a firewall

The setup is like this -
Firewall --(Systlog)-->Syslog Server--(file input)-->Filebeat-->Redis-->Logstash-->Elasticsearch cluster

Recently we had a 5x spike in logs from the firewall. This led to the redis queue being overwhelmed and becoming unresponsive.

We have managed to trace cause of the spike and address the issue. However, this got me thinking if there were any protections in place to prevent such an event overwhelming the setup again.

Here is where i am hoping to hear from the members here, is there any way we can:
a) set a limit on the number of events filebeat pushes into Redis per second b) would this be possible while not dropping the events above the limit (as the events are being read off of a file and not a stream)

thanks in advance.

cheers!

Hello and welcome,

If for some reason filebeat cannot output events, it will backoff on the input, since it is reading from a file it will stop reading until it can output events again.

It has an internal queue, which is on memory per default, and when this queue is full it does not accept new events until events on the queue begin to be sent to the output again.

When reading from files normally this does not lead to data loss, but on some specific cases it can happens like when the file rotates and the old file is deleted for example.

Are you required to use Redis? I have a similar setup, but using Kafka.

I had some issues with redis in the past and decided to switch to Kafka, it adds a little more overhead on management, but in my case it performs way way better.

What I have is:

Firewalls -> Logstash (tcp input, kafka output) -> Kafka -> Logstash (Kafka input, elasticsearch output).