Performance Issues with Logstash UDP/IPFIX


#1

Hello,

I have been trying to get Logstash to be able to process as many IPFIX (Netflow v10) packets as possible. I have seen some cases where Logstash users easily reached 40k events per second, or even up to 90k:


I have tried many combinations of the settings: flush_size, workers (input workers), queue_size, options in logstash.yml and sysctl.conf parameters. My current Logstash configuration is as following, where I can process approximately 5k events per second.

input
{
        udp
        {
                port => 9995
                codec => netflow
                {
                        versions => [10]
                        target => ipfix
                }
                type => "ipfix"
                queue_size => 15000
                workers => 4
        }
}
filter
{
        metrics
        {
                meter => "events"
                add_tag => "metric"
        }
}
output
{
        if "metric" in [tags]
        {
                file
                {
                        path => "/var/log/logstash/metrics.log"
                        codec => line
                        {
                                format => "rate: %{[events][rate_1m]}"
                        }
                }
        }

        elasticsearch
        {
                hosts => [ "host1:9200"
                                , "host2:9200"
                                , "host3:9200" ]
                index => "ipfix-%{+YYYY.MM.dd}"
                flush_size => 500
        }
}

I run Logstash instances on VMWare ESX machines, as virtual machines with Ubuntu server 16.10, 8 cores each and 8 GB RAM. The Logstash heap size is set to min: 2gb and max: 4gb. The virtual machines are connected by 10GBit/s fiber and VMXNET3.

When I increase the number of flows per second I start to notice packet drops:

root@logstash:/var/log/logstash# netstat -su | grep errors
    85729629 packet receive errors

I did a lot of searching and tuning in kernel parameters and ethtool commands, but I cannot get around them. I have tried many things over a long time. What can I possibly be doing wrong?

Thanks a lot in advance!
-Gijs


#2

I have tried the generator input to find out how many events Logstash can theoretically process, and the output was as following (metrics.log, tested on a 6-core machine):

...
rate: 67697.47787382574
rate: 67765.33794824105
rate: 67852.94224895787
rate: 67931.44726019318
rate: 67968.97463324976
rate: 68013.52791363167
...

Whereas with the same machine with the UDP input and Netflow codec does not go any further than:

...
rate: 4053.6175391807656
rate: 4057.2621115182033
rate: 4071.53721290261
rate: 4074.2447318672644
rate: 4074.640933231486
rate: 4081.098071688952
...

Is the Netflow plugin (logstash-codec-netflow) designed for a high number of flows per second? I have to be able to parse at least 40k flow per second, probably a few multiples of that. I have doubts about whether the Netflow codec is the right way to go. If not, I need to find another way!

Thanks,
-Gijs


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.