Best Shipper for 1500+ servers

Hi,
What do you suggest is the best shipper when we intend to ship data from 1500+ application servers? I've used filebeat for 100s of web servers now but it'll be a lot to put filebeat agent on 1500+ applications now.
Also, what is throughput of filebeat, as in, how many events per minute it can handle on it's default setting?
And, Do I need an external queue or logstash's persistent queue would be enough to give some buffering?
I hope I can get a reply.
Thanks!

Hey @Shubhangi,

Why do you think that deploying 1500+ filebeats would be a lot? There shouldn't be any problem with filebeat even with several thousands of servers. Look for example at this use case Monitoring Petabytes of Logs per Day at eBay with Beats | Elastic Blog
Of course with so many servers you will need to properly dimension your Elasticsearch cluster/s.

It depends a lot on your deployment, your machines and the kind of logs collected. In any case filebeat should be able to handle the logs of any usual server.

Each beat has an internal queue to store events before publishing them. By default it can store only some thousands of events in memory, but it can be configured to store more events, even spooling to disk. This offers some buffering in case you want to tolerate some downtime in your network or your Elasticsearch cluster. It could be seen as the logstash persistent queue, but one in each Beat.

Also Beats includes a Kafka output, what allows to send all messages to a Kafka cluster, from where the messages can be collected using Logstash or Beats with the kafka input, and then sent to Elasticsearch. This offers another way of buffering, and more possibilities for big deployments.

Thanks for the reply!

It depends a lot on your deployment, your machines and the kind of logs collected. In any case filebeat should be able to handle the logs of any usual server.

How can I know maximum throughput of the filebeat that I've setup? I can see the performance of filebeat instance in kibana with x-pack monitoring but how can I know how much more it can handle? Same with logstash, how much can my one instance of logstash can take?

TLDR: How do I test an instance of Elastic Stack(ELK+beats) and then scale it according to my needs? What's the best way to performance and load test Elastic stack?

Filebeat should be able to handle the logs of the server where it is installed without major problems.

Scaling Logstash and Elasticsearch is going to depend on your deployment, your logs volume and the kind of machines you are using.
You can try to start sending data from a small set of servers and check the resource usage in your Logstash and Elasticsearch instances, from there try to slowly increase the servers sending data and infer when the resources could start to be constrained. You will also have to take into account if your logs generation volume vary along the day, then you may need some queuing system to handle the peaks.

There are different architectures to consider, and the way of scaling them is going to vary, you can for example:

  • Send data directly from beats to Elasticsearch, then you only need to take into account the performance of your Elasticsearch cluster, and you can increase its capacity by adding more nodes. You may consider using spooling to disk for the internal queue to have some buffering in nodes.
  • Send data from beats to Kafka, and from there use logstash or the beats kafka input to send the data to Elasticsearch. This is a more complex deployment, but allows to have a buffer out of the servers where beats are installed, and to support some peaks if you have them, without needing to increase the number of beats/logstash readers.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.