Netflow Codec : UDP receive errors

Hi,

I have installed LogStash with the Netflow module to catch my flows and output them in ES and it works great, i can see my stats in Kibana without problems.

I have between 5k and 12k flows/sec depending of the time of the day.

A dedicated VM is doing the job with 16 CPU and 8G RAM, but i can see in Netstat that i have a lot of receiving errors :

Udp:
1146481 packets received
89 packets to unknown port received.
59566 packet receive errors
475 packets sent
59566 receive buffer errors
0 send buffer errors

I read that 16 CPU with the config' above could handle 15k flows/sec, so i'm surprised... The config' file is like that :

input {
udp {
port => 9996
codec => netflow
receive_buffer_bytes => 16777216
workers => 16
}
}
output {
elasticsearch {
hosts => ["elasticsearch.xxx.yyy:9200"]
}
}

The rmem_max parameter is configured accordingly :

net.core.rmem_max = 16777216

Heapsize is 4G :

-Xms4g
-Xmx4g

Does anyone has an idea how to mitigate those drops?

Is there any way to ensure that those drops are related with LogStash? I'm not getting any logs...

Any parameters that i could play with to try to handle this?

Any help would be appreciated.

What do you see in top? Are the CPU equally loaded? Are any of the cores close to being saturated?

It's more of a guess, but I'd recommend checking /proc/interrupts to see if by any chance most of the interrupts go to a single CPU.

It might be also worth trying Logstash with a null output, and see if drops are still happening, to check if the slowdown is not caused by backpressure.

I am betting on backpressure. How many ES nodes doe you have and what is the storage? SSD or HDD? How many drives? RAID?

Do you have Metricbeat on the ES nodes? If so, what does write IOPS look like?

Hi,

The CPU are equally loaded, most of the time they run between 50% and 80% which should be OK.

You are right about the interrupts tho...

/proc/interrupts shows that :

19: 186 0 0 118665877 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-fasteoi eth0

CPU3 is handling all the network interrupt.

I tried to modify the file /proc/irq/19/smp_affinity_list but somehow, i can't..
I'm not a Linux expert...

Do you think that's my problem?

I will also investigate about backpressure.

I have a small cluster (3 nodes) handling a lot of logs (5k/sec) and right now i'm sending everything to the same node. I will try to loadbalance on the 3 nodes with LogStash.

Any good idea to check on ElasticSearch if he is overloaded?

Looking at (h)top, it looks like everything is fine.

Thanks a lot!!

Nobody? :frowning:

I still have a lot of receive buffer errors and i can't figure how to mitigate that...

Already spent hours trying different stuff like configuring gigantic buffer size, but nothing helps.

My ES node doesn't look overloaded, but i don't know to verify that precisely.

Any help would be greatly appreciated.

Kr

Can you try, for a short amount of time, without the ES output, just with the dots output? If you can do that, that would clarify if the back-pressure is the reason or not.

I think the IRQ balancing might be also the reason. I don't know what the best way of balancing the interrupts on your system, but I see you tried /proc/irq/19/smp_affinity_list, can you also try /proc/irq/19/smp_affinity?

What Linux distribution are you on? And what network card / network driver do you have?

Hey Hi!

Thank you for your reply.
I've got some time to push the debugging further.

So as far as i know, it's not about backpressure! I tried without any output, i still dropped some UDP packet, so it's more about processing power on Logstash.

I upgaded my server from 16 vCPU to 32 vCPU and tweeked some parameters.

/etc/logstash/logstash.yml :

pipeline.workers: 32
[...]
var.input.udp.receive_buffer_bytes: 33554432
 var.input.udp.queue_size: 20000

sysctl :

net.core.rmem_max = 33554432

Since the upgrade, i didn't get a single drop, but i'am only at ~5000flows/sec today.

netstat -suna :

Udp:
    1061052 packets received
    32 packets to unknown port received.
    0 packet receive errors
    456 packets sent
    0 receive buffer errors
    0 send buffer errors

So, right know it works, but the load is kinda high for "only" 5k flows/sec :

top - 15:18:27 up  1:12,  1 user,  load average: 20.54, 21.17, 20.96
Tasks: 342 total,   1 running, 341 sleeping,   0 stopped,   0 zombie
%Cpu0  : 80.9 us,  0.0 sy,  0.0 ni, 19.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 88.8 us,  0.0 sy,  0.0 ni, 11.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 84.2 us,  0.0 sy,  0.0 ni, 15.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 86.2 us,  0.0 sy,  0.0 ni, 13.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  : 88.2 us,  0.7 sy,  0.0 ni, 11.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  : 83.3 us,  0.0 sy,  0.0 ni, 16.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  : 82.8 us,  0.0 sy,  0.0 ni, 17.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  : 76.6 us,  0.0 sy,  0.0 ni, 23.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu8  : 80.6 us,  0.0 sy,  0.0 ni, 19.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu9  : 76.3 us,  0.3 sy,  0.0 ni, 23.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu10 : 78.6 us,  0.0 sy,  0.0 ni, 21.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu11 : 75.7 us,  0.3 sy,  0.0 ni, 24.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu12 : 76.3 us,  0.0 sy,  0.0 ni, 23.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu13 : 80.3 us,  0.3 sy,  0.0 ni, 19.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu14 : 84.9 us,  0.3 sy,  0.0 ni, 14.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu15 : 75.7 us,  0.0 sy,  0.0 ni, 24.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu16 : 89.5 us,  0.0 sy,  0.0 ni, 10.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu17 : 75.7 us,  0.3 sy,  0.0 ni, 23.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu18 : 83.7 us,  0.3 sy,  0.0 ni, 16.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu19 : 86.1 us,  0.0 sy,  0.0 ni, 10.6 id,  0.0 wa,  0.0 hi,  3.3 si,  0.0 st
%Cpu20 : 79.0 us,  0.0 sy,  0.0 ni, 21.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu21 : 84.9 us,  0.3 sy,  0.0 ni, 14.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu22 : 65.3 us,  0.0 sy,  0.0 ni, 34.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu23 : 74.1 us,  0.0 sy,  0.0 ni, 25.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu24 : 80.9 us,  0.0 sy,  0.0 ni, 19.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu25 : 81.2 us,  0.3 sy,  0.0 ni, 18.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu26 : 77.1 us,  0.3 sy,  0.0 ni, 22.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu27 : 76.1 us,  0.3 sy,  0.0 ni, 23.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu28 : 70.8 us,  0.3 sy,  0.0 ni, 28.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu29 : 86.5 us,  0.0 sy,  0.0 ni, 13.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu30 : 87.5 us,  0.0 sy,  0.0 ni, 12.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu31 : 83.6 us,  0.3 sy,  0.0 ni, 16.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  7992980 total,  2794544 free,  4820724 used,   377712 buff/cache
KiB Swap:  2047996 total,  2047996 free,        0 used.  2852120 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  1291 root      20   0 16.207g 4.568g 174448 S  2577 59.9   1388:44 java 

What i read in the documentation is that i could handle 15.000 flows/sec with 16 Cores, 16 Workers and a 16Mb buffer. In my case, i'm far from this.

So, now that the problem is clearly identified on Logstash, does anyone has any idea about such a high performance issue?

By the way, i'm receiving Netflow V9 from a Cisco Catalyst 6k (Version 12.2), with around 25 flows per UDP packet.

(I tried to get my Netflow from a Palo Alto firewall, but i was dropping almost everything. I found on this post that it was a Palo Alto implementation issue because they send only 1 flow per packet : https://github.com/logstash-plugins/logstash-codec-netflow/issues/85)

Feel free to advice me anything that could lower this performance need from Logstash, because i'm scared that when i will get some burst at 15k flows/sec, i will drop some packets again, even with 32 Cores...

Thank you.

EDIT : forgot to mention, i'm on CentOS 7 :slight_smile:

Can you go through https://www.elastic.co/guide/en/logstash/current/performance-troubleshooting.html and https://www.elastic.co/guide/en/logstash/current/tuning-logstash.html, if you didn't already, and see if any of the tips there help?

Logstash doesn't actually scale well across an increasing number of CPU cores. If you do some research on the scale testing done by Jorrit Folmer (the creator of the Netflow codec) you will notice how the number of events/core drops significantly as more cores are added. My own benchmark tests have confirmed this. If I had 16 cores available I would, at a minimum, run two instances of Logstash each with 8 cores. 4 x 4 cores would likely be even better. Either manually configure devices to send to a particular Logstash instance (manually balancing the load) or put a load balancer in front.

You also have to consider back-pressure. I asked about storage before. An insufficient write IOPS capacity is the #1 reason for back pressure with these kinds of use-cases. HDD vs. SSD, RAID levels, etc. all make a difference. You can't break the laws of physics!

Almost all of my customers are processing network flow data (as well as firewall logs, which are similar in content and volume), and as the creator of ElastiFlow I support an even larger number of users. So I have A LOT of experience making the Elastic Stack scale for these use-cases. What I can tell you with certainty is that you MUST get two things right:

  1. Using the right storage. SSDs will certainly be a necessity at the flow rates you are looking to handle. Multiple Elasticsearch nodes may also be necessary.

  2. Scaling Logstash horizontally (more instances) rather than vertically (more CPUs).

What you will quickly learn that once you give it the right resources Elasticsearch is great at ingesting data (even better than the sizings an Elastic Solutions Architect will give you). The more challenging part of a successful deployment, and where most of you effort will be invested, is large-scale data collection, processing, enrichment and transport.

1 Like

Hi everyone,

I will review the 2 links provided by Tudor.

Robert, the HDD architecture is handled by another team. It think it's SAS HDD, and probably not the fastest architecture ever, however i was dropping a ton of packets on Logstash with 16 vCPU and now with 32 vCPU i didn't drop a single packet out of 20.000.000. Elasticsearch is composed of 3 nodes and right now doesn't look pressured, so i'm pretty sure that my initial problem here wasn't about backpressure.

However, if i continue to add more and more input into my cluster, it might become the case.

On the Logstash perspective, i receive Netflow trafic from only one router, so i can't loadbalance directly onto it.

I'm waiting the next week to see how it behave on a busy day (it's Easter Holliday right now, so there isn't that much flows).
If i start dropping again, maybe i should think about splitting my Logstash into 2 instances with 16 cores or 4 instances with 8 cores, with a load balancer... But it really looks like "too much" for me...

We had an old and shitty Collector for years, and it could handle everything with only 8 CPU. Maybe Logstash add more info' (like GeoIP), but 8 CPU vs 32 CPU is a lot more ressources.

Thank you for you advices.

Consider this table provided by Jorrit...

vCPU UDP workers flows/sec
1 1 2300
2 2 4300
4 4 6700
8 8 9100
16 16 15000
32 32 16000

What this means is that 4 instances with only 2 vCPUs each (only 8 cores total) will handle more flows than a single 32 vCPU instance.

You should be able to put NGiNX as a round-robin load balancer in front of these four instances and have have more capacity (17200 vs 16000) than you do now with a single 32 cores instance.

You might also be interested in this issue...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.