At least once delivery setup with logstash kafka


(Johan Rask) #1

Hi,

Recently, we had an issue where all our servers, including kafka + logstash where restarted at the same time. I was hoping that after restart we would get all messages but unfortunately some messages where lost. After some digging and replying of messages it seems like kafka-consumer is committing offsets of messages that have not yet been delivered to elasticsearch (or whatever output you use).

This actually makes sense since there is an autocommit everyh 5 seconds and I guess that messages can be in "transit" during that period.

Is there something we can do to prevent this from happening? Increase autocommit period, shorter internal queues etc? with enable_autocommit=false, does this mean that offset is never committed since it does not seem to provide a manual commit.

Thanks /Johan Rask


(Christian Dahlqvist) #2

Have you got a persistent queue configured for Logstash?


(Johan Rask) #3

Nope, since we use kafka I consider that my buffer and everything is designed for at-least-once.

Is that my only option?


(Christian Dahlqvist) #4

Logstash has an internal queue, and if you do not use a persistent queue this is in memory, which can lead to events being lost in the event of a crash. You can keep it small and use it together with Kafka.


(Johan Rask) #5

We where just hoping that we could get away without it..

I will dig into it, and I assume there is some kind of backpressure thingy that prevents an input from filling the queue.

Thanks @Christian_Dahlqvist!