UDP/Netflow Performance

andsch · June 15, 2015, 9:17pm

Hi there,

Can anyone share experience about server scaling and UDP performance.

my Logstash setup is quite simple:
UDP input with Netflow Codec -> no Filters -> Elasticsearch output via HTTP via bulk API

My current test environment is based on a quite outdated 4Core Xeon (E5320) with 16GB RAM and 10k SAS drives in Raid1 configuration.

I'm collecting about 2k flows per second from one of our edge routers which causes an average system load of 4.0 during daily peaks.
Nearly all of the CPU load ist caused by the Java process Logstash runs in. ElasticSearch only utilizes half of an core in average.

I'm wondering if this is a normal behavior? I noticed that Logstash itself is eligible to process 100k of events per second, so i'm wondering that 2% of this are causing such a high load.
Is JSON serialisation for ES output causing this high load?

Our production environment currently produces daily peaks of 10k flows/sec. My production hardware should be able to process 20k flows/sec. What will be a suitable server configuration here?

Thanks in advance for all replys

best regards
Andreas

theuntergeek · June 16, 2015, 4:49pm

What does your input block look like? Also, your output block? Which version of Logstash are you using?

This seems quite low, as I'm able to get 30k events per second with the UDP input plugin as it comes "out of the box." This is with Logstash 1.5, btw.

andsch · June 16, 2015, 9:01pm

Hi,

as already mentioned my setup is quite basic.

This is the whole configuration:

input {
    udp {
            port => 9995
            codec => netflow {
                    versions => [5, 9]
            }
            workers => 4
            type => "netflow"
    }
}

filter {
}

output {
    elasticsearch {
            host => "localhost"
            cluster => logstash
            protocol => http
            flush_size => 4000
            index => "netflow-%{+YYYY.MM.dd}"
    }
}

i tried several settings of worker thread in the input section and tested some bulk-api without any significant effort.

Currently i'm using logstash 1.5.0

theuntergeek · June 16, 2015, 9:18pm

Thanks for adding the information. This makes it more clear.

flush_size => 4000 is inappropriate with Logstash 1.5, and is probably the bottleneck here. Since 1.5 was released, the output retries any messages which failed to be parsed in the bulk output. 500 is a more appropriate number. The default is 1000. We're in the midst of doing some performance testing with the new retry logic. I believe it will shake out somewhere near 500. In older versions (where no retry logic existed) it was fire and forget. If Elasticsearch failed to index, you were out of luck.
While workers => 4 may make sense for you in your udp input, the default 2 should suffice, but likely you won't need more than 3. The queue_size directive will perhaps help (the default is 2000) if you're not ingesting fast enough. It sounds more like the output is the blocker, though.

Topic		Replies	Views
Logstash UDP input only processing ~1000 events/sec Logstash	2	489	February 26, 2017
Netflow Codec : UDP receive errors Logstash	12	3377	May 11, 2018
Performance Issues with Logstash UDP/IPFIX Logstash	2	1906	March 21, 2017
(Logstash newbie) Logstash 5.0. 150k flows/min limit Logstash	5	571	December 14, 2016
Netflow High CPU utilization and poor throughput Logstash	4	2024	August 30, 2017

UDP/Netflow Performance

Related topics