Logstash: Persistent Queue Behaviour

Hi,
I've seen some behaviour with persistent queues that surprises me.I noticed this when throughput was capped on a logstash node when the EBS Burst Balance was exceeded for the volume of the logstash node in question. Here is a screenshot from cloudwatch illustrating the behaviour:

Given that the persistent queue length is usually either very low or zero it makes me wonder whether every inbound message is being written to and read back from disk before being indexed to elasticsearch?

The persistent queue overview that correlates with the above screenshot can be seen here:

So, can someone clarify for me what my expectations should be when using persistent queues on EBS volumes ? Why does persistent queueing eat so much burst quota even when the node isn't falling behind particularly badly?

Thx
D

You are right, when using persistent queues every message will be written and read from the queue in the disk.

From the documentation, this is how the persistent queue works.

input → queue → filter + output

Thankyou @leandrojmp for your prompt reply. How do users generally handle this? Use provisioned IOP volumes for persistent queueing? Or put a message queue upstream and not implement queueing in logstash? It seems that persistent queues are a slightly double edged sword.

It will depend on your infrastructure and preferences, I do not like to use the logstash persistent queue, I choose to use Kafka as message queue.

I send everything to a Kafka cluster and Logstash will consume from the topics in Kafka.

If for some reason I can't send directly to Kafka, I will ship to logstash into a pipeline that will only ship to kafka, then another pipeline will read and send to elasticsearch.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.