I have a pretty large fleet of Filebeat instances and all of them ship events to a Kafka cluster with multiple brokers which then got consumed by Logstash. I am currently researching on how to configure Filebeat to ship events more efficiently to Kafka.
I was looking at round_robin.group_events since I use round_robin as partitioning strategy with expectation that I would see improvement in terms of efficiency, but I haven't noticed any improvement in that matter when using different values for
round_robin.group_events. For example, I've set
group_events value up to 5000 on multiple servers where I have Filebeat installed and while shipping ~50000 events per second from those servers to Kafka the performance was quite similar to when I use default
group_events value of 1. The metrics that I was monitoring on the Filebeat servers and also the Kafka brokers were number of batches sent to Kafka over a specific period of time, resource utilization (CPU, RAM, Disk I/O), network performance (e.g number of connections, packet size etc.). All metrics seem to be pretty same like if that setting has no effect at all.
In absence of extensive documentation of
round_robin.group_events I was wondering if someone has any suggestion on how this setting affects the performance and in which metrics I should expect an improvement?