Hardware requirement for logstash

What will be the best suitable hardware configuration for logstash, we are reading the data from Apache Kafka and adding a few filters in logstash config file on the different sources of data and then data will be sent to elasticsearch.

  • We are using around 7 different sources mapped to 7 topics in Kafka and all are parsing through the logstash.
  • We will be reading 4TB of data per day.

As I am new to this, I need help in understanding how many servers do I need to allocate for logstash for stable transformation of data without any interruptions.

That will depend on the volume of data and complexity of the processing. Configure a server with the pipelines that you need and see how quickly it processes the data, then scale up the number of servers to increase the throughput as needed.

1 Like

Since you are reading from Kafka, you will be limited by the number of partitions in your topics.

A single Logstash can scale by configuring multiple workers for the pipeline (one worker per vCPU).
For HA you will need a consumer group (multiple Logstash agents can form a single consumer group).
Assuming your Kafka topic A has 8 partitions, you could use 4 logstash hosts & 2 workers for the pipeline.

Since you mention 7 topics, a first question is if you have different credentials for each topic (e.g. if using JAAS, you can have only 1 jaas file per Logstash, so you will need one LS agent for each topic).

After making the topics & partitions calculations, start doing some benchmarks so you can see the max events/second of a logstash with your pipeline code.
Start one logstash with one worker, with low settings (e.g. 1GB ram, pipeline bulk 250 events etc). Increase the settings until you see no visible improvements.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.