Multiple outputs?

Hi Experts,
I Want to send events to multiple outputs but don't understand the options I have?
From the docs it say:
"Outputs are the final phase of the Logstash pipeline. An event can pass through multiple outputs, but once all output processing is complete, the event has finished its execution."

This is my output:
output {
redis {
host => "192.168.10.90"
data_type => "list"
key => "logstash"
}

tcp {
codec => "json_lines"
host => "192.168.10.215"
port => "6501"
}
}

But when adding the tcp output I reduce my log flow into Redis by half?

So my questions are:

  1. Does and event pass through multiple outputs or just one?
  2. How come the Redis output gets reduced by half when adding TCP, should be all or nothing right?
  3. Do I need to clone all events to get this to work?
  4. Can I run two logstash instances with separate config files listening to the same UDP input and then sent to different outputs?

Br
MA

1 Like
  1. Does and event pass through multiple outputs or just one?

Events are sent to all outputs. (Conditionals can change this but that's not the case for you.)

  1. How come the Redis output gets reduced by half when adding TCP, should be all or nothing right?

Yes.

Thanks for the quick reply.

Why does the TCP output then affect the event rate into Redis?

Br
Ma

What's affected here, the event rate or the count? That the rate decreases when you add more outputs shouldn't be surprising.

The event rate (and count) into redis gets reduces by more than 50% when adding the TCP output.
See attached graph, dip is when TCP is addad and removed (ES is reading from Redis).
Br
Mathias

If the TCP output destination can't keep up, could that stall the logstash output thread and cause this behavior?
But it should be one tread per output type, or?

//MA

Yes, after changing the second output option from TCP to UDP it all looks good!
So it sure looks like my consumer of TCP data cant keep up and then stalls logstash output worker somehow.
Can anyone explain this for me?

Thanks
Mathias

I am still a little new at this, but I believe it has to do with the how the event gets processed. An event is created from the input, passed through the filters, and then passed into the output. It then needs to traverse each output before it is finished. Read this for more info on how it works.

Logstash only works with a handful of events at a time. Filters and Outputs were sort of combined recently, but think of your Input and Outputs as separate applications that work independently from each other. Your input takes data and puts it into a queue. That queue has a max size (20 events I think). Once it hits that max, the Input stops receiving data and putting it into the queue, essentially putting back pressure on whatever is feeding it. The filter/output checks this queue and pulls in events for processing. However it does not go back to the queue for more until it is completely done with the event. This means going through every single outputs.

In other words.

  1. Event is sent to Logstash, received by Input
  2. Event is processed by Input and put into queue.
  3. Filter/Output checks the queue for new events, finds one and pulls it into the processing pipeline.
  4. Event is sent to Elastic cluster
  5. Event is sent to TCP location.
  6. Filter/Output checks for new events in queue.

So what is happening is that step 5 is taking longer than anything else. But it can't go to step 6 and pull in a new event until it is done with 5. So the TCP output slows down everything, even back to step 1. Remember that the Input can't receive data if the queue is full. The entire structure is only as fast as it can output data.

I can't really think of a good way to change it though. For example, if Elastic can handle 10k events per second, and TCP can handle 5k per second, then there is a 5k per second difference. What would you do with those events? They would have to be go somewhere. I guess in theory you could add a queue in between steps 4 and 5 and then have it go back to step 1 to receive more data. That would allow everything to be sent to Elastic immediately. But how big would you allow that queue to get? It would increase by 5k events every single second in our example. If you had a big peak during the day it could get caught up overnight, but if the data is consistently high it would get bigger forever and ever.

Hi Brandon,
Thanks for a great explanation, this make sense and explains the behavior I see. I just assumed every output had its own thread but apparently not.
The short term fix form me was using UDP who doesn't care about the consumer side but long term off course it needs to keep up.

Thanks!
Mathias