Data loss using UDP input plugin

Anusha_sunkadh · October 26, 2015, 1:01pm

Almost 80% of data is lost when i use UDP input plugin for netflow data. Below is my configuration file.
input {
udp {
queue_size => 50000
port => 9993
type => "netflow"
workers => 4
codec => netflow { versions => [5] }
}
}

output {
kafka {
broker_list => "172.17.33.17:9092"
topic_id => "storm"
producer_type => "async"
batch_num_messages => 50000
queue_buffering_max_messages => 50000
queue_buffering_max_ms => 50
queue_enqueue_timeout_ms => -1
workers => 5
}
}

not sure where i am going wrong. previously i was using default values for all of plugin properties. After increasing some of the buffer sizes saw some improvement.

logstash in running on 4 core machine and all the 4 cores are showing as 80-90% used.

Is the kafka output slowing down and causing the packets to be dropped from the buffers ? or UDP input configuration is wrong ? I am using logstash 1.5.0

warkolm · October 27, 2015, 1:41am

It's UDP so it's not guaranteed.
Are you sure it's all reaching LS?

Anusha_sunkadh · October 27, 2015, 2:48am

Thanks warkolm, we had run tcpDump tool to capture the UDP traffic on that machine. when we compared the tcpdump collected data and logstash collected data we came to know about this data loss. we used logstash file output for this testing. is there any other way to identify where actually we are missing the data.

warkolm · October 27, 2015, 2:57am

Does it happen if you just use a basic UDP input and a simple file output?

Anusha_sunkadh · October 27, 2015, 2:58am

its the same with both file output and kafka output.

warkolm · October 27, 2015, 4:43am

What sort of throughput are you trying to process?

Anusha_sunkadh · October 27, 2015, 5:56am

According to TCP dump : 4 lakh UDP packets per minute, 14 million netflow packets are seen per minute.

Christian_Dahlqvist · October 27, 2015, 6:40am

That is around 240k messages per second. That sounds like a lot for a single Logstass instance to handle. If you are successfully only capturing only 20% of these events, you are likely to need to spread the load across a larger number of Logstass instances.

Anusha_sunkadh · October 27, 2015, 6:42am

yep Christian, i thought about the same but we are currently listening on a port for the UDP traffic..
how can i share it with 2 different logstash instances ?

Christian_Dahlqvist · October 27, 2015, 6:55am

I am not sure you can have multiple Logstass instances listening to the same port on a single host, but even if you could you might be limited by the resources of the server. What does resource usage look like on the host when you are collecting traffic? Is there anything limiting throughput, e.g. CPU?

You might be able to scale out to multiple instances by using a loadbalancer able to handle UDP or postbly even by setting up DNS round robin.

Anusha_sunkadh · October 27, 2015, 6:58am

i have 4 input UDP workers and that machine is 4 core , all the 4 cpu's. 350-360% of cpu is being used.

Christian_Dahlqvist · October 27, 2015, 7:08am

If it uses that amount of CPU for processing 20% of the traffic, you will need to get a host with more CPU (as that seems to be the limiting factor) or scale out.

Anusha_sunkadh · October 27, 2015, 7:11am

Christian dont you think kafka is taking time and we might be missing our data there ?

Christian_Dahlqvist · October 27, 2015, 7:42am

It is quite possible that the Kafka output plugin is limiting throughput to some extent, but I am not sure exchanging it for some other output plugin would improve performance. Given the gap between the current throughput level and what is required, you will need to scale up and/or out.

Joe_Lawson · October 27, 2015, 12:35pm

You can test the throughput of the Kafka plugin by running a generator
input
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-generator.html
with a dots codec
https://www.elastic.co/guide/en/logstash/current/plugins-codecs-dots.html

That'll give you an idea of your capacity.

Anusha_sunkadh · October 28, 2015, 1:02pm

sorry for the wrong data, we are actually receiving, 20K UDP packets per minute .. so does one instance of logstash capable for parsing it.

With the above configuration i am able to capture 90% of the data for first 5 mins after that data loss starts. So why is this inconsistency ?

Joe_Lawson · October 28, 2015, 9:29pm

Set your Kafka workers to 1 and see if that helps. You aren't going to get
any more performance by having it larger than 1 due to the parallelism of
Logstash. Using async mode will definitely drop messages if the buffer is
slow. I'd also set queue.buffering.max.ms much higher, like 5000 as that is
going to chew through CPU and could affect your throughput (too many small
batches going out). Set your batch.num.messages to ~ 1/50th of your max so
1000 to balance out the queue buffer max ms being higher.

Try that out and let us know!

Anusha_sunkadh · October 29, 2015, 2:28am

thanks Joe, i tried your suggestion, still the same it captures 100% of data for first 6 mins and then falls back to 20% of data capture.

Joe_Lawson · November 4, 2015, 3:46am

That definitely sounds like a bottleneck somewhere.

Try benchmarking with the dots codec just hitting stdout.
output {
stdout { codec => dots }
}

$ bin/logstash -f test.conf | pv -Wr > /dev/null

That'll tell you if logstash has the throughput.

Topic		Replies	Views
Logstash UDP input keeps losing data Logstash	3	538	May 25, 2022
Logstash Input UDP problem Logstash	2	1457	July 6, 2017
Logstash Netflow udp queue size Logstash	2	709	April 15, 2019
UDP/Netflow Performance Logstash	4	2538	July 6, 2017
Port drop statistics Logstash	4	920	July 31, 2018

Data loss using UDP input plugin

Related topics