LS loosing messages with persisted queue

jetnet · July 20, 2017, 5:54am

Hello Logstash experts,

I've encountered a very weird issue: I was indexing a database using the JDBC input, but realized, what the document count in ES was less than in the source table. It tuned out, that the reason was the persisted queue - with the memory queue the document count matched.
So, I tried to reproduced the issue with a very basic configuration:
logstash.yml

queue.type: memory
#queue.type: persisted
log.level: info
path.logs: c:/temp/ls_logs
path.data: c:/temp/ls_data

with queue.type: memory

C:\user\programms\logstash>bin\logstash --path.settings file:///C:/user/etc/logstash_test -e "input { generator { count => 10000 } } filter {uuid {target => '@uuid'} } output {stdout {codec => line}}" -w 1 -b 1 | find /c "Hello"
10000

with queue.type: persisted

C:\user\programms\logstash>bin\logstash --path.settings file:///C:/user/etc/logstash_test -e "input { generator { count => 10000 } } filter {uuid {target => '@uuid'} } output {stdout {codec => line}}" -w 1 -b 1 | find /c "Hello"
1759

with queue.type: persisted once again

C:\user\programms\logstash>bin\logstash --path.settings file:///C:/user/etc/logstash_test -e "input { generator { count => 10000 } } filter {uuid {target => '@uuid'} } output {stdout {codec => line}}" -w 1 -b 1 | find /c "Hello"
1710

and back to queue.type: memory

C:\user\programms\logstash>bin\logstash --path.settings file:///C:/user/etc/logstash_test -e "input { generator { count => 10000 } } filter {uuid {target => '@uuid'} } output {stdout {codec => line}}" -w 1 -b 1 | find /c "Hello"
10000

Logstash version: 5.5.0
OS: Windows 10
Java: 1.8

Does someone have an idea, what could be wrong here?
Thanks!

jetnet · July 20, 2017, 6:01am

it must be something wrong with the worker count:
-w 2 -b 100 => 9802 lines
-w 2 -b 1000 => 11700 lines
using the previous command and with queue.type: persisted
and
w/o the worker parameter:
-b 2000 => 10000
-b 10 => 10000

Update: it seems, that the worker count and the batch size are irrelevant, because sometimes it works with the same settings and sometimes does not.

jetnet · July 28, 2017, 10:27am

Sorry for bumping up the thread, but it seems to be a very important issue.
I could reproduce it on Unix as well (redhat). Make sense to open a Github ticket?
Thanks!

guyboertje · July 31, 2017, 2:03pm

Your example using generator to illustrate the problem does not illustrate it at all...

The generator input is the one and only input that shuts down LS when its done. With memory queue, as its synchronous, when the last 10000th event is produced the filter/output will process it - but with PQ these two functions are disconnected so the input rapidly produces 10000 events and persists them then begins the shutdown, The filter/output stage only processes 1750 ish events before it is shutdown, the rest are in the PQ for next time.

This is by design. There will be a queue.drain option available soon to allow for ephemeral container based instances to drain the queue after receiving an ordered shutdown command.

Back to your original problem.

Are you shutting down LS after the JDBC input has read the DB contents (but before the filter/outputs have read all events out of the queue?

If you look at the path.queue contents, you should see the queue growing while the inputs write faster than the filters read but once the inputs have stopped the contents should shrink until there two files left - NNN.page file and its checkpoint file.

jetnet · July 31, 2017, 7:11pm

thank you very much for getting back to me!
I see, my "generator" example was not a proper way to reproduce the issue.
I cannot run the original tests currently, but as far as I remember, I was shutting down the LS instance by Ctrl-C. The size of the queue directory was 256Mb - basically it contained one page file, but I need to double check this again.
And thanks a lot for the explanation how the PQ works. That explains as well, why sometimes the number or processed events were more 10.000.
I'll re-test with the JDBC input and let you know.

system · August 28, 2017, 7:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash - unexpected behaviour with persistent queues Logstash	4	1037	February 14, 2019
Correct way of Logstash persisted queue performance testing Logstash	6	2855	March 20, 2019
Persistent Queue Configuration question Logstash	13	5790	March 16, 2017
Logstash 5.3 Persistent Queue issue Logstash	12	2605	May 3, 2017
Something want to know about logstash persisted queue Logstash	3	3236	December 27, 2017

LS loosing messages with persisted queue

Related topics