Logstash stops processing when using persistent queues

I've updated logstash.yml to use persistent queues for a bit of extra redundancy, and have run into a problem where individual nodes will just stop processing messages after about 30+ minutes a random amount of time.

Kafka also shows the partition they're reading from as stopped.

The nodes themselves are still doing something, as they are still producing metrics, so I suspect it's related to the Kafka input somehow.

Once a node stops, the remaining nodes will pick up the messages from its Kafka partition after a ~5 minute delay, however if left long enough, all the nodes will eventually stop processing.

I really have no idea where to start looking as there is nothing of note in logstash-plain.log.

logstash.yml changes are:

queue.type: persisted
queue.page_capacity: 10mb
queue.max_bytes: 10mb
queue.checkpoint.writes: 256

If I disable persistent queues, Logstash goes back to behaving itself. Enable queues again, and I get more stoppages.

Any ideas what could be causing this?

Logstash 6.2.2-1
Kafka 2.11-0.11.0.2

Bump: any ideas? This is unusable for me at the moment.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.