I am running the logstash:7.17.4 docker image, and I've set
queue.type: persisted in my logstash.yml, leaving all other queue settings at the default.
Most of the time, I see the normal, expected behavior, where my events are written to the PQ and drained by the output, creating backpressure if the queue fills up.
However, several times, when the 1GB queue fills, if logstash is restarted or crashes, it continously restarts unless I delete some or all of the pages, throwing an error like
"Pipeline to-lambda current queue size (1140850688) is greater than 'queue.max_bytes' (1073741824)."
What could be causing logstash to "overfill" its queue?
is there a configuration I can set to start rejecting events below the hard limit, so this doesn't happen?
pqcheck shows no issues, just, as expected, slightly too much data.
pqrepair has no effect on the crashloop.
If logstash can't send data, it accumulates in the queue as it arrives. It grows to the limit.
BUT, I think something changed here, I restarted logstash after upgrading to 8.2.2 and got this message. Logstash had been stopped with 0 or very few items in the queue. Then new startup calculated it's desired max queue size as max_queue times number of pipelines, which was WAY more disk than I had. In your case, fix why logstash cant send data, then I'd increase the max to 2G and see if it starts.
Apologies for the delay, I thought I'd set this up to notify me, but apparently not!
to clarify the scenario
- logstash is on remote sites, publishing to the cloud- there are a number of cases, most notably initial setup, where it will have a multi-GB backlog of events to publish, and it will fill the default 1GB queue; that's expected
2)what's not expected is that if I have a restart, the queue for my one pipeline will be ~5% over the limit, and logstash will refuse to start.
do I need to just ask for enough disk space for a 10-20GB queue size, to try and ensure it NEVER hits the limit?
Can I not count on logstash + beats/http input to stop accepting new events once it hits the limit, rather than getting into a state where if it restarts I have to dump much of the queue or it simply can't start again?
I haven't had problems when something like winlogbeat is deployed to many servers and sends in a backlog of data (we limit winlogbeat backlog to 72 hours). I don't remember if we had persistent queues for that or not, beats will "hold" events it can't send out of the source files, so I tend to not use persistent logstash queues for beats->logstash->elastic pipelines. In fact, anything new I'm doing, I'm eliminating the logstash middle-man wherever I can.
I do think there is a possible bug/design problem, at least in 8.x logstash, where a restart with full queues fails. My solution was to limit the max-queue size x # of pipelines to < the free disk space, with some 10% or so spare. But it does result in an overallocation of disk and an artificial reduction in usable queue space per pipeline.