This depends on the volume of data you have and for how many time you want for this data to be available in the broker.
For most of cases there is basically zero need for tuning Kafka or Logstash, just keep in mind to use more than one partition on Kafka side, ideally try to have the number of partitions equal to the number of logstash nodes.
You need to start small and improve as needed, it is pretty common to see people trying to start with the optimal configuration for all tools, which is a mistake in my opinion.
I want to create some scenarios and choose the optimal one for my use case because I have some configurations, but expertise plays an important role for this type of configurations.
Have you thought about using Redis? We sit Redis between our Filebeat servers and Logstash instances.
We have a bank of Redis single instances that we load balance over from Filebeat. We set-up our Logstash servers with the Redis Logstash input set-up to read from any of the Redis instances.
For us it works really well and supports the ingestion of around 2 billion loglines a day.
This is how we configure it, but please bear in mind Leandro's last points.
We use a bank of single instance Redis. In the example here, we have three Redis instances. We configure Filebeat to round robin to our Redis instances as follows:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.