Persistent Queue, no data lost regardless settings?

(abel) #1

Hi there!
reading the blog's post about persistent queue

This section is not clear for me.

Application / process level crash / interrupted shutdown.
This is the most common cause of potential data loss that PQ helps
solve. This happens when Logstash crashes with an exception or is killed
at the process level or is generally interrupted in a way which
prevents a safe shutdown. This would typically result in data loss when
using in-memory queuing. With PQ, no data will be lost, regardless of
the durability setting (see the queue.checkpoint.writes setting) and any
unacknowledged data will be replayed upon restarting Logstash.

For what I understand, the settings queue.checkpoint.writes is the one that determines when the 'buffered persistent queue' is written to disk.
With the default configuration (1024), the queue will not live in disk until reach that value of events in the input(or the defined queue.checkpoint.interval).
So losing data on a "logstash crash" it will actually depends on the 'durability settings'

Could you confirm it?
Thank you

(Guy Boertje) #2

The data is written by the input to the Persistent Queue Java object, then inside this object the event is serialized and written to a in-memory location in a memory mapped file that the OS hold open. When it is time to checkpoint (1024 new events are in the MM file) then the MM files are flushed and the checkpoint record is written - this is a very small piece of data that is an atomic single disk sector write.

If logstash crashes and restarts it will open the same set of MM files and because the OS held them open they will be in the same state as before the crash. As the PQ object opens the files it looks to serialized events that are newer than the last checkpoint file recorded - if it finds any it will checkpoint them before it resumes accepting new events.

If the machine restarts suddenly, any events not checkpointed will be lost.

Summary: the durability settings are there to mitigate against the risk of machine failure.

(abel) #3

Thank you a lot for your answer! :slight_smile:

(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.