Netflow Codec : UDP receive errors

Hey Hi!

Thank you for your reply.
I've got some time to push the debugging further.

So as far as i know, it's not about backpressure! I tried without any output, i still dropped some UDP packet, so it's more about processing power on Logstash.

I upgaded my server from 16 vCPU to 32 vCPU and tweeked some parameters.

/etc/logstash/logstash.yml :

pipeline.workers: 32
[...]
var.input.udp.receive_buffer_bytes: 33554432
 var.input.udp.queue_size: 20000

sysctl :

net.core.rmem_max = 33554432

Since the upgrade, i didn't get a single drop, but i'am only at ~5000flows/sec today.

netstat -suna :

Udp:
    1061052 packets received
    32 packets to unknown port received.
    0 packet receive errors
    456 packets sent
    0 receive buffer errors
    0 send buffer errors

So, right know it works, but the load is kinda high for "only" 5k flows/sec :

top - 15:18:27 up  1:12,  1 user,  load average: 20.54, 21.17, 20.96
Tasks: 342 total,   1 running, 341 sleeping,   0 stopped,   0 zombie
%Cpu0  : 80.9 us,  0.0 sy,  0.0 ni, 19.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 88.8 us,  0.0 sy,  0.0 ni, 11.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 84.2 us,  0.0 sy,  0.0 ni, 15.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 86.2 us,  0.0 sy,  0.0 ni, 13.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  : 88.2 us,  0.7 sy,  0.0 ni, 11.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  : 83.3 us,  0.0 sy,  0.0 ni, 16.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  : 82.8 us,  0.0 sy,  0.0 ni, 17.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  : 76.6 us,  0.0 sy,  0.0 ni, 23.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu8  : 80.6 us,  0.0 sy,  0.0 ni, 19.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu9  : 76.3 us,  0.3 sy,  0.0 ni, 23.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu10 : 78.6 us,  0.0 sy,  0.0 ni, 21.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu11 : 75.7 us,  0.3 sy,  0.0 ni, 24.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu12 : 76.3 us,  0.0 sy,  0.0 ni, 23.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu13 : 80.3 us,  0.3 sy,  0.0 ni, 19.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu14 : 84.9 us,  0.3 sy,  0.0 ni, 14.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu15 : 75.7 us,  0.0 sy,  0.0 ni, 24.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu16 : 89.5 us,  0.0 sy,  0.0 ni, 10.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu17 : 75.7 us,  0.3 sy,  0.0 ni, 23.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu18 : 83.7 us,  0.3 sy,  0.0 ni, 16.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu19 : 86.1 us,  0.0 sy,  0.0 ni, 10.6 id,  0.0 wa,  0.0 hi,  3.3 si,  0.0 st
%Cpu20 : 79.0 us,  0.0 sy,  0.0 ni, 21.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu21 : 84.9 us,  0.3 sy,  0.0 ni, 14.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu22 : 65.3 us,  0.0 sy,  0.0 ni, 34.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu23 : 74.1 us,  0.0 sy,  0.0 ni, 25.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu24 : 80.9 us,  0.0 sy,  0.0 ni, 19.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu25 : 81.2 us,  0.3 sy,  0.0 ni, 18.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu26 : 77.1 us,  0.3 sy,  0.0 ni, 22.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu27 : 76.1 us,  0.3 sy,  0.0 ni, 23.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu28 : 70.8 us,  0.3 sy,  0.0 ni, 28.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu29 : 86.5 us,  0.0 sy,  0.0 ni, 13.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu30 : 87.5 us,  0.0 sy,  0.0 ni, 12.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu31 : 83.6 us,  0.3 sy,  0.0 ni, 16.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  7992980 total,  2794544 free,  4820724 used,   377712 buff/cache
KiB Swap:  2047996 total,  2047996 free,        0 used.  2852120 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  1291 root      20   0 16.207g 4.568g 174448 S  2577 59.9   1388:44 java 

What i read in the documentation is that i could handle 15.000 flows/sec with 16 Cores, 16 Workers and a 16Mb buffer. In my case, i'm far from this.

So, now that the problem is clearly identified on Logstash, does anyone has any idea about such a high performance issue?

By the way, i'm receiving Netflow V9 from a Cisco Catalyst 6k (Version 12.2), with around 25 flows per UDP packet.

(I tried to get my Netflow from a Palo Alto firewall, but i was dropping almost everything. I found on this post that it was a Palo Alto implementation issue because they send only 1 flow per packet : https://github.com/logstash-plugins/logstash-codec-netflow/issues/85)

Feel free to advice me anything that could lower this performance need from Logstash, because i'm scared that when i will get some burst at 15k flows/sec, i will drop some packets again, even with 32 Cores...

Thank you.

EDIT : forgot to mention, i'm on CentOS 7 :slight_smile: