Using RabbitMq as broker between Beats and Logstash

NasrJBr · May 11, 2024, 11:36am

Hello, I’ve created an architecture for a SIEM using ELK. In this architecture, I’ve used a RabbitMQ broker between Beats and Logstash. Is it a good choice? My final target is to collect logs from at least 1000 Beats and forward them through RabbitMQ to be processed by Logstash. The use of the broker in this architecture is to bypass the problem of bottlenecking between the Beats and Logstash in case of too many logs. I chose RabbitMQ instead of Kafka because I only need to forward logs from Beats to Logstash and I wanted to avoid the complexity of Kafka deployment. My question is, is it a good choice to use RabbitMQ in this scenario? What are your recommendations?
Thank you

ashishtiwari1993 · May 11, 2024, 1:05pm

Hi @NasrJBr,

You can directly use logstash in filebeat's output. No need to use rabbitmq unless there is specific need. Logstash will also queued up all the events persistently.

Christian_Dahlqvist · May 11, 2024, 1:09pm

As far as I know Beats are not able to output directly to RabbitMQ so I do not think this will work. Either deploy Kafka or send it directly to Logstash. Logstash is able to enqueue on disk, but as it is local disk it does not give you the resiliency that Kafka would offer.

leandrojmp · May 11, 2024, 1:57pm

I would say that it is not, mainly because beats cannot output data directly to rabbitmq, so you would need another piece to manage in your infrastructure.

Kafka is a way better choice as you can output from beats directly into a Kafka topic.

I've been using Kafka as a message broker in ELK deployments for so many years that every time I need to spin up a new Elastic Cluster I consider Kafka as an essential part of it.

NasrJBr · May 11, 2024, 2:38pm

Hi @ashishtiwari1993, won’t there be a risk of data loss from Beats if there’s a connection or performance issue?

NasrJBr · May 11, 2024, 2:39pm

Hi @Christian_Dahlqvist, I believe RabbitMQ is mentioned in the official documentation.

NasrJBr · May 11, 2024, 2:43pm

Hi @leandrojmp, okay, so using Kafka is better than RabbitMQ for handling these types of problems. Initially, I considered using direct ingestion, but I took into consideration potential bottlenecks and data loss in case of problems, so that i add broker to my architecture.

Christian_Dahlqvist · May 11, 2024, 2:43pm

The link I provided shows the supported outputs, and RabbitMQ is not on it. There is however a module for collecting logs (possibly also metrics) from RabbitMQ, but that is very different.

NasrJBr · May 11, 2024, 2:50pm

Ah, okay. So there is another approach for adding RabbitMQ. From your perspective, you recommend using Kafka instead of RabbitMQ because my final objective is to eliminate bottlenecks and data loss.

Christian_Dahlqvist · May 11, 2024, 2:53pm

Yes, I would recommend using Kafka as I have seen it used successfully in deployments with very high throughput numbers. I am not sure how RabbitMQ performance compares, but believe it is significantly slower.

NasrJBr · May 11, 2024, 3:00pm

Ok, thank you. One more thing, is there any other alternative to be use instead of a broker to handle the two problems (bottlenecks and data loss)

Christian_Dahlqvist · May 11, 2024, 3:01pm

No, I think using Kafka is the best option. It is a very common pattern and therefore asy to get help around.

NasrJBr · May 11, 2024, 3:03pm

ok, thank you for you help @Christian_Dahlqvist.

ashishtiwari1993 · May 11, 2024, 4:42pm

Filebeat guarantees that events will be delivered to the configured output at least once and with no data loss. Filebeat is able to achieve this behavior because it stores the delivery state of each event in the registry file.

Once data delivered to logstash, logstash will write to the disk. So there is no data loss.

In case of hight traffic, you can add more logstash server behind the load balancer.

NasrJBr · May 11, 2024, 4:49pm

Hi @ashishtiwari1993, instead of using a broker, can I use a load balancer between Beats and Logstash? Do all Beats support data loss prevention like Filebeat? Also, can you recommend some load balancers for this SIEM use case?

Christian_Dahlqvist · May 11, 2024, 4:57pm

When you write data to Kafka, the data is distributed across the cluster and the los of a single node does not lead to data loss. Logstash only persist data to the local disk, so losing a Logstash node would likely result in data loss.

Another benefit of using Kafka is that it helps distribute the processing load evenly across processes pulling from it. With a large number of Logstash instances you do run the risk of having the Logstash instances very unevenly loaded.

Using Kafka is therefore in my opinion the superior option.

ashishtiwari1993 · May 11, 2024, 5:03pm

Agreed!!! For distributed architecture, kafka is the best. Just curious, distributing same events can led to data duplication (If we're pushing to Elasticsearch, same cluster.)

leandrojmp · May 11, 2024, 5:37pm

It depends on how they are distributed, using Kafka this is not an issue as you can use the same group id for each logstash, then the events will not be duplicated.

A good approach is to have the number of partitions on Kafka to be the same as the number of Logstash nodes, this way each logstash node will consume from one partition and the events will be evenly distributed.

NasrJBr · May 12, 2024, 12:03pm

When we talk about data loss prevention using a load balancer, it can be inefficient due to overloads from Beats agents. Therefore, the best solution is to use a broker. My question is, what is the best configuration to apply for a Kafka broker with Logstash in the context of a SIEM ?

Christian_Dahlqvist · May 12, 2024, 1:17pm

Why would this be different? You are still shipping data and want to do so in a performant and reliable fashion.

Topic		Replies	Views
ELK architecture choice Elasticsearch	5	745	February 26, 2019
Can activemq be used as a broker? Beats filebeat	6	829	April 9, 2018
Is Rabbitmq Plugin for Filebeat available? Beats	3	2648	July 9, 2017
Redis Vs Kafka Vs RabbitMQ in ELK Logstash	7	8432	July 6, 2017
[Solved] Using the Kafka output the Development Build of Filebeat Beats filebeat	6	4421	June 24, 2016

Using RabbitMq as broker between Beats and Logstash

Related topics