UDP input plugin monitoring and elasticsearch output

Hello Logstash GURUS, so I have a logstash running live and basically the guy who set it up quit his job :frowning: Since I am totally new to this and have a system running live already that I need to take care of some help would be much appreciated :slight_smile:

  1. Basically my question is how do you guys monitor UDP input plugin queue? I tried using Node stats API but all I got was:

"plugins": {

       "inputs": [],

and where I guess the info about the queue should be:

   "queue": {

       "type": "memory"

   }

UDP input plugin documentation says that the monitoring is active by default and I don't have it disabled in the config so what gives? Is it because the queue type should be persistent to see any relevant info? But why inputs info is empty too? I get other info about filters and such just fine.

  1. I am having some problems with my storage even tho I have M.2 SSD's so I am trying to optimize the process at least a little bit and I am wondering how the UDP input and elasticsearch output works. I input around 1 KB UDP packet which contains lets say about 1 000 events which I parse upon input. It filters everything and then elasticsearch outputs it right away with the 1 000 events or the input process keeps on going until the output reaches 20 MB and then logstash does the output in bulk no matter if the total size of one UDP packet parsing is around 4 KB? That part to me is unclear by the documentation. Is it queue -> input -> filter -> output or queue -> input -> filter -> buffer -> input -> filter -> buffer until the buffer reaches 20 MB?

Hi! Sorry to hear you got thrown feet first into the fire.

First, it doesn't look like you have any inputs running if that's what's going on in the node stats API. Are you sure the LS instance you're hitting is running the config you think it's running?

Input plugins don't maintain their own queues, there's a global one for logstash. That queue can either be memory (the default) or persistent (buffers to disk).

The size of the in-memory queue is fixed and is equal to BATCH_SIZE*NUM_WORKERS. You can set these values with the -b and -w options respectively.

You can't really monitor the in-memory queue but you can monitor the persistent queue.

WRT buffering, I'd recommend reading https://www.elastic.co/guide/en/logstash/current/tuning-logstash.html.

The flow in your case is UDP -> decode with codec -> Queue -> filters -> ES Output . The only buffering is at the in-memory queue where we group events up to the batch size specified with -b. If the batch delay (by default 5ms) is exceeded between receiving any two events the batch is flushed to the rest of the pipeline.

Internally, the ES output might break up that batch into 20MB chunks if it is large. This would only happen if the serialized size of your events exceeded 20MB, which is quite big. So, for a 500 event batch size that would mean 40k events (serialized). We hard coded the size at 20MB because it basically doesn't get any faster at larger values. We prefer not to make things that don't need to be tuned tunable.

1 Like

Thanks a lot for your reply. I was looking around for some info but I guess not a lot of people are using the UDP input and maybe my questions are a bit more than "I can't get logstash running".

You mentioned that input plugins don't maintain their own queues but UDP input documentation states otherwise.

queue_size param:

Value type is number
Default value is 2000 

This is the number of unprocessed UDP packets you can hold in memory before packets will start dropping.

I tweaked everything including batch sizes, worker counts, multiple queue parameters and still I get UDP packet drops after some time (probably because the queue mentioned above gets flooded in time because I set it to 200 000 UDP packets), but I think I get it why that is. I simply feed too many UDP packets to OS and logstash is not working fast enough to process them all. What I still don't like tho why CPU utilization is not at 100% even tho I have 12 workers running on 6 cores... I get around 80% on CPUs even tho the bottleneck seems to be UDP input and decoding. That should take up 100% CPU no?

You are correct! The UDP input is an exception. My apologies for the mistake.

Have you benchmarked other UDP servers? Have you tried sending over localhost? UDP packets can be dropped for more reasons than just logstash, basically anything in your networking stack can be a bottleneck there.

One other thing to consider, if you are sensitive to packet loss, would TCP be a better choice of protocol?

1 Like

Have you benchmarked other UDP servers?

Yes I have multiple UDP streams and multiple logstash instances to take the load and all have similar results depending on logstash server hardware parameters.

Have you tried sending over localhost?

Networking is not the issue because I monitor everything I can and there are no packet loss until logstash instance suddenly stops listening on the socket, the socket then fills up on the OS level and starts dropping any new incoming packets. When you pause the packet stream and give it time, all the buffers clear out in few minutes and then you can resume the stream without any problems. Problem is I can't just pause the stream from time to time and loose that many packets because the stream is ~2k pps (packets per second). The UDP was chosen because it gives better performance on the network level and we are talking about huge loads here that routers have on them already and if there are only few errors and clear network performance difference between TCP and UDP we chose UDP.

I did some more digging and found some other guy having pretty much same performance issues as I have. He was talking about being able to stream ~600 - 700 pps per instance without problems which are basically same numbers what I have. So probably my only option is to put couple more logstash instances to work and forget about it for now until the load reaches higher numbers... This whole idea of streaming the packets directly without even any means to monitor the queue size seems just wrong to me and probably will have to find a better solution in some time.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.