Which input logstash plugin is the fastest?

Hello everybody.

I need hight event rate to logstash - at least 50k, and I must also have a way of further scaling
How I can forward 50k, 100k, 150k events per second?
I have central rsyslog server that stores logs of all my devices, and I need forward logs to logstash.
I tried to use syslog input plugin, results 3,5k event rate
unix input plugin - 11k rate
tcp plugin - about 50k rate

For test i use simple logstash config, like this:

   input {
            
            }
    }
    filter {
                    metrics {
                            meter => "in_events"
                            add_tag => "metrics"
                    }
    }
    output {
            if "metrics" in [tags] {
                    file {
                            path => "/var/log/logstash/logmeter"
                            codec => line { format => "in_ rate_1m: %{[in_events][rate_1m]}   out_rate_1m: %{[out_events][rate_1m]}"}
                         }
            }

Tell me how events per second are in the productive you do?

Performance will depend on the amount of processing you do on the events as well as the throughput supported by downstream systems, not just the throughput input plugin. What kind of processing will be doing on your data? Where will you be sending it?

I know that perfomance depend at logstash filters and logstash output. Because I remove all my filters, and set output to file. But if I view <50k events on test config, that when I enable my logstash filters (grok, aggregate and etc) I have even less events rate. I think, for high perfomance I must use logstash (w\o filters) -> redis -> logstash (with filters) -> elastic

OK, so the Logstash config you are referring to is basically a collector that does minimal processing and enqueues data in Redis. This is generally a good architecture as you can have multiple Logstash indexers reading from it and allows you to scale out horizontally. It will also allow buffering if the indexing is not able to keep up, which will reduce the risk of losing data at the source.

You will however probably also need to be able to scale out the collection layer horizontally. Most input plugins can be tuned quite a bit, so it would be useful to see the configurations you have used to reach the results mentioned.

I use the following options:

syslog {
port => 8000
type => log
}

tcp {
type => log
port => 8000
}

unix {
path => "/tmp/socket"
}

My test server have the next config:

  • centos 6.8
    -8Gb ram
    -8 processor Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz

Which version of Logstash are you using? How many concurrent connections are you using when sending data to Logstash?

I use latest stable verison logstash-2.4.0
In production I must use one connections. Because I have one central log serve. But in my test I tried to forward log from several connections to unix socket, but I have only 11k rate.

I tried use filebeat. Filebeat forward log to logstash (8 workers). I put 4 log files /tmp/file1.log /tmp/file2.log /tmp/file3.log /tmp/file4.log
filebeat config:
filebeat:
prospectors:
paths: /tmp/*.log
input_type: log
registry_file: /var/lib/filebeat/registry
output:
logstash:
hosts: ["x.x.x.x:8000"]
worker: 8
bulk_max_size: 10000

other values filebeat config is default/

And I have the next results in_ rate_1m: 3408
I think is very bad perfomance

If the events are similar size as in the previous examples, I would agree 3408 events per second is not very good. What does your beats input config look like?

My filebeat input config:

filebeat:
prospectors:
-
paths:
- /tmp/*.log
input_type: log
scan_frequency: 0s
harvester_buffer_size: 32768
spool_size: 81920

Also, I tried the following scheme:
syslog-ng forward log to redis and logstash input log from redis. The result is a large flow log from syslog-ng to redis, volume of redis is grows up very fast, but logstash can input only 9k rate per second.
Logstash input redis config:

redis {
host => "x.x.x.x"
data_type => "list"
type => log
key => "pp_rtest"
threads => 8
}

I have not tuned the reds input in some time, but recall seeing the batch size parameter having an impact on performance. Try gradually increasing this to see how that impacts throughput.

Perhaps batch size take affect the performance, but I doubt I can get 50k event rate per sec.
I think there should be a global solution. Do you agree?

Increasing batch size should give you a significant boost in throughput. To get to the optimal performance for your use case you may need to be methodical and benchmark a few different combinations of worker threads (Logstash filter workers as well as input and output workers). I do not know what the limit it.

At some point you will need to scale out though, and Redis (or another message queue) will allow you to have a number of Logstash instances reading off the same queue. The limit to most ingest pipelines I see is however usually the actual processing of the events or the throughput of the outputs.