I need hight event rate to logstash - at least 50k, and I must also have a way of further scaling
How I can forward 50k, 100k, 150k events per second?
I have central rsyslog server that stores logs of all my devices, and I need forward logs to logstash.
I tried to use syslog input plugin, results 3,5k event rate
unix input plugin - 11k rate
tcp plugin - about 50k rate
Performance will depend on the amount of processing you do on the events as well as the throughput supported by downstream systems, not just the throughput input plugin. What kind of processing will be doing on your data? Where will you be sending it?
I know that perfomance depend at logstash filters and logstash output. Because I remove all my filters, and set output to file. But if I view <50k events on test config, that when I enable my logstash filters (grok, aggregate and etc) I have even less events rate. I think, for high perfomance I must use logstash (w\o filters) -> redis -> logstash (with filters) -> elastic
OK, so the Logstash config you are referring to is basically a collector that does minimal processing and enqueues data in Redis. This is generally a good architecture as you can have multiple Logstash indexers reading from it and allows you to scale out horizontally. It will also allow buffering if the indexing is not able to keep up, which will reduce the risk of losing data at the source.
You will however probably also need to be able to scale out the collection layer horizontally. Most input plugins can be tuned quite a bit, so it would be useful to see the configurations you have used to reach the results mentioned.
I use latest stable verison logstash-2.4.0
In production I must use one connections. Because I have one central log serve. But in my test I tried to forward log from several connections to unix socket, but I have only 11k rate.
If the events are similar size as in the previous examples, I would agree 3408 events per second is not very good. What does your beats input config look like?
Also, I tried the following scheme:
syslog-ng forward log to redis and logstash input log from redis. The result is a large flow log from syslog-ng to redis, volume of redis is grows up very fast, but logstash can input only 9k rate per second.
Logstash input redis config:
I have not tuned the reds input in some time, but recall seeing the batch size parameter having an impact on performance. Try gradually increasing this to see how that impacts throughput.
Increasing batch size should give you a significant boost in throughput. To get to the optimal performance for your use case you may need to be methodical and benchmark a few different combinations of worker threads (Logstash filter workers as well as input and output workers). I do not know what the limit it.
At some point you will need to scale out though, and Redis (or another message queue) will allow you to have a number of Logstash instances reading off the same queue. The limit to most ingest pipelines I see is however usually the actual processing of the events or the throughput of the outputs.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.