Persisted Queue (Re-Play of existing Data)


(Nate) #1

I have about 15GBs worth of pesisted queue's that I need to "replay"/"re-run" through the pipeline. However, after a restart/reboot/etc logstash will not begin to re-ingest the data that is queued. All the data is still there, the logstash yaml points to the correct directory (using "/data/logstash").
Anything I can do to cause a re-ingest/"replay"/"re-run"?

I have read this https://www.elastic.co/guide/en/logstash/current/upgrading-logstash-pqs.html
which would make it seem like all I have to do is restart and make sure path is pointed at the right directory.

We have X-Pack, and it can be shown in monitoring for that logstash node that there are Event's queued.


(Colin Surprenant) #2

Hello @neu5ron,

First, are you seeing any error logs? If you do please share them. Also, which version of logstash are you running and on which platform?

I am not sure I understand your issue here. In a "normal" operation, if data is queued up in the PQ and you shutdown logstash and restart it, data in the PQ not processed will be picked up and ingested into logstash.

Now, in between that shutdown and restart, you can totally move the PQ directory to some other path. All you need to do is to update the path.queue setting in logstash.yml. The default path.queue is to point to path.data/queue and the default path.data is data/ in the logstash home dir, so the default queue dir is data/queue/ in the logstash home dir. Now there is a small subtlety here, the actual PQ data files (which are a series of page.X and checkpoint.X files), need to be in a dir named after the pipeline name inside the path.queue dir. The default pipeline name is main so the actual PQ data files will be in data/queue/main.

So if you change you path.queue to something like path.queue: /tmp in logstash.yml then you PQ data files need to be in /tmp/main/ (if you haven't changed your pipeline name).

Let me know if that solves your problem.
Colin


(Nate) #3

Thanks for reply.
So none of the data/directory was moved. It is just not processing the queued data. Monitoring shows that there are events queued so I assume logstash knows the data is there :slight_smile: but is not doing anything about it.
I tried restarting/etc..
possibly related here:


(Colin Surprenant) #4

@neu5ron Can you then please provide the relevant error logs which prevents LS from starting? Also, can you please do a ls -la of the queue data files and report it here too?


(Nate) #5

There are no error logs unfortunately...

path

/data/logstash/queue/main

-rw-r--r-- 1 logstash logstash 34 Dec 11 20:15 checkpoint.4514
-rw-r--r-- 1 logstash logstash 34 Dec 11 21:24 checkpoint.4515
-rw-r--r-- 1 logstash logstash 34 Dec 11 19:35 checkpoint.4516
-rw-r--r-- 1 logstash logstash 34 Dec 11 19:38 checkpoint.4517
-rw-r--r-- 1 logstash logstash 34 Dec 11 19:40 checkpoint.4518
-rw-r--r-- 1 logstash logstash 34 Dec 11 19:43 checkpoint.4519
-rw-r--r-- 1 logstash logstash 34 Dec 11 19:45 checkpoint.4520
-rw-r--r-- 1 logstash logstash 34 Dec 11 19:48 checkpoint.4521
-rw-r--r-- 1 logstash logstash 34 Dec 11 19:51 checkpoint.4522
-rw-r--r-- 1 logstash logstash 34 Dec 11 19:54 checkpoint.4523
-rw-r--r-- 1 logstash logstash 34 Dec 11 19:56 checkpoint.4524
-rw-r--r-- 1 logstash logstash 34 Dec 11 19:59 checkpoint.4525
-rw-r--r-- 1 logstash logstash 34 Dec 11 20:01 checkpoint.4526
-rw-r--r-- 1 logstash logstash 34 Dec 11 20:04 checkpoint.4527
-rw-r--r-- 1 logstash logstash 34 Dec 11 20:07 checkpoint.4528
-rw-r--r-- 1 logstash logstash 34 Dec 11 20:11 checkpoint.4529
-rw-r--r-- 1 logstash logstash 34 Dec 11 20:14 checkpoint.4530
-rw-r--r-- 1 logstash logstash 34 Dec 11 20:16 checkpoint.4531
-rw-r--r-- 1 logstash logstash 34 Dec 11 21:25 checkpoint.4532
-rw-r--r-- 1 logstash logstash 34 Dec 11 21:55 checkpoint.head
-rw-r--r-- 1 logstash logstash 0 Nov 3 16:05 .lock
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 19:30 page.4514
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 19:32 page.4515
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 19:35 page.4516
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 19:38 page.4517
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 19:40 page.4518
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 19:43 page.4519
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 19:45 page.4520
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 19:48 page.4521
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 19:51 page.4522
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 19:54 page.4523
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 19:56 page.4524
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 19:59 page.4525
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 20:01 page.4526
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 20:04 page.4527
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 20:07 page.4528
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 20:09 page.4529
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 20:14 page.4530
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 20:16 page.4531
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 21:25 page.4532
-rw-r--r-- 1 logstash logstash 262144000 Dec 11 21:57 page.4533


(Colin Surprenant) #6

@neu5ron, which version of logstash are you running and on which platform?

first make sure you have a safe copy/backup/tar.gz of the queue directory stored somewhere so we can run a few tests without being afraid of loosing it.

This is very intriguing. Can you please run the following command from you logstash home directory, just change the dir = "data/queue/main" if you changed the queue path and report the result here?

vendor/jruby/bin/jruby -rpp -e 'dir = "data/queue/main"; Dir.glob("#{dir}/checkpoint.*").sort_by { |x| x[/[0-9]+$/].to_i}.each { |checkpoint| data = File.read(checkpoint); version, page, firstUnackedPage, firstUnackedSeq, minSeq, elementCount, crc32 = data.unpack("nNNQ>Q>NN"); fa = firstUnackedSeq >= (minSeq + elementCount); ps = File.exist?("#{dir}/page.#{page}") ? File.size("#{dir}/page.#{page}") : nil; print("#{File.basename(checkpoint)}, #{fa ? "FA" : "UN"}, size: #{ps ? ps : "NA"}, "); p(version: version, page: page, firstUnackedPage: firstUnackedPage, firstUnackedSeq: firstUnackedSeq, minSeq: minSeq, elementCount: elementCount, crc32: crc32) }'

Colin


(Nate) #7

logstash 5.6.4
CentOS 7

Output:

checkpoint.head, UN, size: 262144000, {:version=>1, :page=>4533, :firstUnackedPage=>4514, :firstUnackedSeq=>2786584716, :minSeq=>2786584716, :elementCount=>29697, :crc32=>1997571703}
checkpoint.4514, UN, size: 262144000, {:version=>1, :page=>4514, :firstUnackedPage=>0, :firstUnackedSeq=>2776922292, :minSeq=>2776922292, :elementCount=>512183, :crc32=>2774688018}
checkpoint.4515, UN, size: 262144000, {:version=>1, :page=>4515, :firstUnackedPage=>0, :firstUnackedSeq=>2777434475, :minSeq=>2777434475, :elementCount=>510915, :crc32=>2115720018}
checkpoint.4516, UN, size: 262144000, {:version=>1, :page=>4516, :firstUnackedPage=>0, :firstUnackedSeq=>2777945390, :minSeq=>2777945390, :elementCount=>511181, :crc32=>2171335398}
checkpoint.4517, UN, size: 262144000, {:version=>1, :page=>4517, :firstUnackedPage=>0, :firstUnackedSeq=>2778456571, :minSeq=>2778456571, :elementCount=>511990, :crc32=>393196180}
checkpoint.4518, UN, size: 262144000, {:version=>1, :page=>4518, :firstUnackedPage=>0, :firstUnackedSeq=>2778968561, :minSeq=>2778968561, :elementCount=>513328, :crc32=>593535511}
checkpoint.4519, UN, size: 262144000, {:version=>1, :page=>4519, :firstUnackedPage=>0, :firstUnackedSeq=>2779481889, :minSeq=>2779481889, :elementCount=>511126, :crc32=>2132449713}
checkpoint.4520, UN, size: 262144000, {:version=>1, :page=>4520, :firstUnackedPage=>0, :firstUnackedSeq=>2779993015, :minSeq=>2779993015, :elementCount=>514380, :crc32=>3449672172}
checkpoint.4521, UN, size: 262144000, {:version=>1, :page=>4521, :firstUnackedPage=>0, :firstUnackedSeq=>2780507395, :minSeq=>2780507395, :elementCount=>517681, :crc32=>1881891505}
checkpoint.4522, UN, size: 262144000, {:version=>1, :page=>4522, :firstUnackedPage=>0, :firstUnackedSeq=>2781025076, :minSeq=>2781025076, :elementCount=>519507, :crc32=>619347250}
checkpoint.4523, UN, size: 262144000, {:version=>1, :page=>4523, :firstUnackedPage=>0, :firstUnackedSeq=>2781544583, :minSeq=>2781544583, :elementCount=>515282, :crc32=>4090127464}
checkpoint.4524, UN, size: 262144000, {:version=>1, :page=>4524, :firstUnackedPage=>0, :firstUnackedSeq=>2782059865, :minSeq=>2782059865, :elementCount=>516695, :crc32=>4181336340}
checkpoint.4525, UN, size: 262144000, {:version=>1, :page=>4525, :firstUnackedPage=>0, :firstUnackedSeq=>2782576560, :minSeq=>2782576560, :elementCount=>517761, :crc32=>2404136133}
checkpoint.4526, UN, size: 262144000, {:version=>1, :page=>4526, :firstUnackedPage=>0, :firstUnackedSeq=>2783094321, :minSeq=>2783094321, :elementCount=>513797, :crc32=>332181937}
checkpoint.4527, UN, size: 262144000, {:version=>1, :page=>4527, :firstUnackedPage=>0, :firstUnackedSeq=>2783608118, :minSeq=>2783608118, :elementCount=>514462, :crc32=>4019580203}
checkpoint.4528, UN, size: 262144000, {:version=>1, :page=>4528, :firstUnackedPage=>0, :firstUnackedSeq=>2784122580, :minSeq=>2784122580, :elementCount=>516273, :crc32=>1555102885}
checkpoint.4529, UN, size: 262144000, {:version=>1, :page=>4529, :firstUnackedPage=>0, :firstUnackedSeq=>2784638853, :minSeq=>2784638853, :elementCount=>393038, :crc32=>3851706649}
checkpoint.4530, UN, size: 262144000, {:version=>1, :page=>4530, :firstUnackedPage=>0, :firstUnackedSeq=>2785031891, :minSeq=>2785031891, :elementCount=>516460, :crc32=>3526237324}
checkpoint.4531, UN, size: 262144000, {:version=>1, :page=>4531, :firstUnackedPage=>0, :firstUnackedSeq=>2785548351, :minSeq=>2785548351, :elementCount=>519777, :crc32=>2427734882}
checkpoint.4532, UN, size: 262144000, {:version=>1, :page=>4532, :firstUnackedPage=>0, :firstUnackedSeq=>2786068128, :minSeq=>2786068128, :elementCount=>516588, :crc32=>1628518855}


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.