LS loosing messages with persisted queue


#1

Hello Logstash experts,

I've encountered a very weird issue: I was indexing a database using the JDBC input, but realized, what the document count in ES was less than in the source table. It tuned out, that the reason was the persisted queue - with the memory queue the document count matched.
So, I tried to reproduced the issue with a very basic configuration:
logstash.yml

queue.type: memory
#queue.type: persisted
log.level: info
path.logs: c:/temp/ls_logs
path.data: c:/temp/ls_data
  1. with queue.type: memory
C:\user\programms\logstash>bin\logstash --path.settings file:///C:/user/etc/logstash_test -e "input { generator { count => 10000 } } filter {uuid {target => '@uuid'} } output {stdout {codec => line}}" -w 1 -b 1 | find /c "Hello"
10000
  1. with queue.type: persisted
C:\user\programms\logstash>bin\logstash --path.settings file:///C:/user/etc/logstash_test -e "input { generator { count => 10000 } } filter {uuid {target => '@uuid'} } output {stdout {codec => line}}" -w 1 -b 1 | find /c "Hello"
1759
  1. with queue.type: persisted once again
C:\user\programms\logstash>bin\logstash --path.settings file:///C:/user/etc/logstash_test -e "input { generator { count => 10000 } } filter {uuid {target => '@uuid'} } output {stdout {codec => line}}" -w 1 -b 1 | find /c "Hello"
1710
  1. and back to queue.type: memory
C:\user\programms\logstash>bin\logstash --path.settings file:///C:/user/etc/logstash_test -e "input { generator { count => 10000 } } filter {uuid {target => '@uuid'} } output {stdout {codec => line}}" -w 1 -b 1 | find /c "Hello"
10000

Logstash version: 5.5.0
OS: Windows 10
Java: 1.8

Does someone have an idea, what could be wrong here?
Thanks!


#2

it must be something wrong with the worker count:
-w 2 -b 100 => 9802 lines
-w 2 -b 1000 => 11700 lines
using the previous command and with queue.type: persisted
and
w/o the worker parameter:
-b 2000 => 10000
-b 10 => 10000

Update: it seems, that the worker count and the batch size are irrelevant, because sometimes it works with the same settings and sometimes does not.


#3

Sorry for bumping up the thread, but it seems to be a very important issue.
I could reproduce it on Unix as well (redhat). Make sense to open a Github ticket?
Thanks!


(Guy Boertje) #4

Your example using generator to illustrate the problem does not illustrate it at all...

The generator input is the one and only input that shuts down LS when its done. With memory queue, as its synchronous, when the last 10000th event is produced the filter/output will process it - but with PQ these two functions are disconnected so the input rapidly produces 10000 events and persists them then begins the shutdown, The filter/output stage only processes 1750 ish events before it is shutdown, the rest are in the PQ for next time.

This is by design. There will be a queue.drain option available soon to allow for ephemeral container based instances to drain the queue after receiving an ordered shutdown command.

Back to your original problem.

Are you shutting down LS after the JDBC input has read the DB contents (but before the filter/outputs have read all events out of the queue?

If you look at the path.queue contents, you should see the queue growing while the inputs write faster than the filters read but once the inputs have stopped the contents should shrink until there two files left - NNN.page file and its checkpoint file.


#5

thank you very much for getting back to me!
I see, my "generator" example was not a proper way to reproduce the issue.
I cannot run the original tests currently, but as far as I remember, I was shutting down the LS instance by Ctrl-C. The size of the queue directory was 256Mb - basically it contained one page file, but I need to double check this again.
And thanks a lot for the explanation how the PQ works. That explains as well, why sometimes the number or processed events were more 10.000.
I'll re-test with the JDBC input and let you know.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.