Hey Hi!
Thank you for your reply.
I've got some time to push the debugging further.
So as far as i know, it's not about backpressure! I tried without any output, i still dropped some UDP packet, so it's more about processing power on Logstash.
I upgaded my server from 16 vCPU to 32 vCPU and tweeked some parameters.
/etc/logstash/logstash.yml :
pipeline.workers: 32
[...]
var.input.udp.receive_buffer_bytes: 33554432
var.input.udp.queue_size: 20000
sysctl :
net.core.rmem_max = 33554432
Since the upgrade, i didn't get a single drop, but i'am only at ~5000flows/sec today.
netstat -suna :
Udp:
1061052 packets received
32 packets to unknown port received.
0 packet receive errors
456 packets sent
0 receive buffer errors
0 send buffer errors
So, right know it works, but the load is kinda high for "only" 5k flows/sec :
top - 15:18:27 up 1:12, 1 user, load average: 20.54, 21.17, 20.96
Tasks: 342 total, 1 running, 341 sleeping, 0 stopped, 0 zombie
%Cpu0 : 80.9 us, 0.0 sy, 0.0 ni, 19.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 88.8 us, 0.0 sy, 0.0 ni, 11.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 84.2 us, 0.0 sy, 0.0 ni, 15.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 86.2 us, 0.0 sy, 0.0 ni, 13.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 88.2 us, 0.7 sy, 0.0 ni, 11.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 83.3 us, 0.0 sy, 0.0 ni, 16.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 82.8 us, 0.0 sy, 0.0 ni, 17.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 76.6 us, 0.0 sy, 0.0 ni, 23.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 80.6 us, 0.0 sy, 0.0 ni, 19.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 76.3 us, 0.3 sy, 0.0 ni, 23.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 78.6 us, 0.0 sy, 0.0 ni, 21.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 75.7 us, 0.3 sy, 0.0 ni, 24.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 76.3 us, 0.0 sy, 0.0 ni, 23.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 80.3 us, 0.3 sy, 0.0 ni, 19.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 84.9 us, 0.3 sy, 0.0 ni, 14.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 75.7 us, 0.0 sy, 0.0 ni, 24.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 89.5 us, 0.0 sy, 0.0 ni, 10.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 : 75.7 us, 0.3 sy, 0.0 ni, 23.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu18 : 83.7 us, 0.3 sy, 0.0 ni, 16.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu19 : 86.1 us, 0.0 sy, 0.0 ni, 10.6 id, 0.0 wa, 0.0 hi, 3.3 si, 0.0 st
%Cpu20 : 79.0 us, 0.0 sy, 0.0 ni, 21.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu21 : 84.9 us, 0.3 sy, 0.0 ni, 14.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu22 : 65.3 us, 0.0 sy, 0.0 ni, 34.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu23 : 74.1 us, 0.0 sy, 0.0 ni, 25.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu24 : 80.9 us, 0.0 sy, 0.0 ni, 19.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu25 : 81.2 us, 0.3 sy, 0.0 ni, 18.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu26 : 77.1 us, 0.3 sy, 0.0 ni, 22.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu27 : 76.1 us, 0.3 sy, 0.0 ni, 23.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu28 : 70.8 us, 0.3 sy, 0.0 ni, 28.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu29 : 86.5 us, 0.0 sy, 0.0 ni, 13.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu30 : 87.5 us, 0.0 sy, 0.0 ni, 12.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu31 : 83.6 us, 0.3 sy, 0.0 ni, 16.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 7992980 total, 2794544 free, 4820724 used, 377712 buff/cache
KiB Swap: 2047996 total, 2047996 free, 0 used. 2852120 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1291 root 20 0 16.207g 4.568g 174448 S 2577 59.9 1388:44 java
What i read in the documentation is that i could handle 15.000 flows/sec with 16 Cores, 16 Workers and a 16Mb buffer. In my case, i'm far from this.
So, now that the problem is clearly identified on Logstash, does anyone has any idea about such a high performance issue?
By the way, i'm receiving Netflow V9 from a Cisco Catalyst 6k (Version 12.2), with around 25 flows per UDP packet.
(I tried to get my Netflow from a Palo Alto firewall, but i was dropping almost everything. I found on this post that it was a Palo Alto implementation issue because they send only 1 flow per packet : https://github.com/logstash-plugins/logstash-codec-netflow/issues/85)
Feel free to advice me anything that could lower this performance need from Logstash, because i'm scared that when i will get some burst at 15k flows/sec, i will drop some packets again, even with 32 Cores...
Thank you.
EDIT : forgot to mention, i'm on CentOS 7