Persistent Queue, no data lost regardless settings?

amiguez · August 24, 2017, 3:51pm

Hi there!
reading the blog's post about persistent queue

This section is not clear for me.

Application / process level crash / interrupted shutdown.
This is the most common cause of potential data loss that PQ helps
solve. This happens when Logstash crashes with an exception or is killed
at the process level or is generally interrupted in a way which
prevents a safe shutdown. This would typically result in data loss when
using in-memory queuing. With PQ, no data will be lost, regardless of
the durability setting (see the queue.checkpoint.writes setting) and any
unacknowledged data will be replayed upon restarting Logstash.

For what I understand, the settings queue.checkpoint.writes is the one that determines when the 'buffered persistent queue' is written to disk.
With the default configuration (1024), the queue will not live in disk until reach that value of events in the input(or the defined queue.checkpoint.interval).
So losing data on a "logstash crash" it will actually depends on the 'durability settings'

Could you confirm it?
Thank you

guyboertje · September 4, 2017, 8:06am

The data is written by the input to the Persistent Queue Java object, then inside this object the event is serialized and written to a in-memory location in a memory mapped file that the OS hold open. When it is time to checkpoint (1024 new events are in the MM file) then the MM files are flushed and the checkpoint record is written - this is a very small piece of data that is an atomic single disk sector write.

If logstash crashes and restarts it will open the same set of MM files and because the OS held them open they will be in the same state as before the crash. As the PQ object opens the files it looks to serialized events that are newer than the last checkpoint file recorded - if it finds any it will checkpoint them before it resumes accepting new events.

If the machine restarts suddenly, any events not checkpointed will be lost.

Summary: the durability settings are there to mitigate against the risk of machine failure.

amiguez · September 5, 2017, 10:20am

Thank you a lot for your answer!

system · October 3, 2017, 10:20am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash Data Persistency Logstash	1	292	March 11, 2021
Persistent Queue Configuration question Logstash	13	5803	March 16, 2017
Persisted queue the uploaded logs are being sent again after restart Logstash	2	437	April 7, 2017
Persisted queue checkpoint and acknowledged inputs Logstash	1	281	February 27, 2020
Logstash persistent queue empty but all events coming in fine Logstash	6	1299	April 27, 2021

Persistent Queue, no data lost regardless settings?

Related topics