What happens when Redis output plugin can't deliver a message?

Hello there, I'm trying to understand what happens when an output plugin can't deliver a message.

We are using logstash-output-redis to send batched messages to Redis and I would like to understand what happens if Redis is down for a few minutes. Do we lose the event?

I went through the source code of the plugin to try to understand what happens. I'm not a Logstash nor a Ruby expert so I'm probably missing a lot of context, but from what I could understand this is what happens:

  • The plugin receives a message
  • It pushes the message into a buffer.
  • When there are enough messages the buffer is flushed
  • The flush basically means RPUSH into Redis
  • The buffer will automatically retry failed flushes

So if Redis is down, the flush will likely fail and the buffer will retry it. From the source code it seems it will try indefinitely.

And what happens when the buffer gets full? I seems that it will block and wait for the buffer to be flushed, again it will wait indefinitely.

So now Redis is down, the buffer is full, calls to output events are blocking. How long can this continue? What happens to Logstash?

Does it continue consuming the events (input) ? Are the events accumulating in memory somewhere?

Thank you!

Generally logstash has an at-least-once delivery model. As you say, the code for the redis output retries if a connection is down, and sleeps if it detects congestion.

If the output stops accepting new events then logstash will queue them. If the queues back up then logstash will stop reading events from inputs.

1 Like

Thanks for the link. It's more clear now and it helped me find the real problem of our setup.

We are running Logstash and Redis on Kubernetes. Multiple Logstash pods and one Redis pod. It's possible that during a node draining (due to node update for instance) or rebalancing, the pods will be terminated and moved to another node.

The real problem I identified is that if Redis goes down before Logstash, Logstash won't be able to flush the events before it itself shuts down, thus dropping the events. I believe we need to configure a persistent queue to get around this problem.

Any recomendation? Thanks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.