Hi,
We've had some cases where something happens, our logstash pipeline crashes and attempts to restart, but is unable to start up again because it fails to create the persistent queue. When this happens, I have to go in and clear out the page and checkpoint files, and then the pipeline starts. We are using 4 workers, logstash version 5.6.3 via docker, and our pipeline has two jdbc inputs and two elasticsearch outputs.
This has happened a few times, and the logs look a bit different for each time. I'm just going to focus on the most recent case.
A listing of the queue directory when the pipeline was unable to start up:
ls -l
34 Nov 24 15:27 checkpoint.128
34 Nov 24 19:13 checkpoint.129
34 Nov 24 20:33 checkpoint.head
262144000 Nov 24 15:27 page.128
262144000 Nov 24 20:33 page.130
The checkpoint and head files were last modified at 20:33, which is when the pipeline crash happened. Data up until that time did make it into elasticsearch.
First, there was a stack trace:
IOError: java.io.IOException: computed checksum=-1147247917 != checksum for file=0
read at org/logstash/ackedqueue/io/AbstractByteBufferPageIO.java:241
readBatch at org/logstash/ackedqueue/Page.java:55
_readPageBatch at org/logstash/ackedqueue/Queue.java:531
readBatch at org/logstash/ackedqueue/Queue.java:522
read_batch at org/logstash/ackedqueue/ext/JrubyAckedQueueExtLibrary.java:163
read_batch at /usr/share/logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:68
read_batch at /usr/share/logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:68
read_next at /usr/share/logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:260
read_next at /usr/share/logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:260
read_batch at /usr/share/logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:172
read_batch at /usr/share/logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:172
read_batch at /usr/share/logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:171
read_batch at /usr/share/logstash/logstash-core/lib/logstash/util/wrapped_acked_queue.rb:171
worker_loop at /usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:377
start_workers at /usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:342
run at java/lang/Thread.java:748
Logstash has crashed, attempts a restart, and we get this error:
2017-11-24T20:33:48,450][ERROR][logstash.pipeline ] Logstash failed to create queue exception=org.logstash.ackedqueue.io.AbstractByteBufferPageIO$PageIOInvalidElementException: Element seqNum 0 is expected to be 78089562, backtrace [org/logstash/ackedqueue/io/AbstractByteBufferPageIO.java:143:in `readNextElement', org/logstash/ackedqueue/io/AbstractByteBufferPageIO.java:81:in `open', org/logstash/ackedqueue/io/MmapPageIO.java:33:in `open', org/logstash/ackedqueue/Queue.java:206:in `open', org/logstash/ackedqueue/ext/JrubyAckedQueueExtLibrary.java:132:in `ruby_open', org/logstash/ackedqueue/ext/JrubyAckedQueueExtLibrary$RubyAckedQueue .......
There's more to the error, but it's really long. Let me know if I should post the rest of it!
Any help is greatly appreciated! Thanks!