Hi,
I've seen some behaviour with persistent queues that surprises me.I noticed this when throughput was capped on a logstash node when the EBS Burst Balance was exceeded for the volume of the logstash node in question. Here is a screenshot from cloudwatch illustrating the behaviour:
Given that the persistent queue length is usually either very low or zero it makes me wonder whether every inbound message is being written to and read back from disk before being indexed to elasticsearch?
The persistent queue overview that correlates with the above screenshot can be seen here:
So, can someone clarify for me what my expectations should be when using persistent queues on EBS volumes ? Why does persistent queueing eat so much burst quota even when the node isn't falling behind particularly badly?
Thankyou @leandrojmp for your prompt reply. How do users generally handle this? Use provisioned IOP volumes for persistent queueing? Or put a message queue upstream and not implement queueing in logstash? It seems that persistent queues are a slightly double edged sword.
It will depend on your infrastructure and preferences, I do not like to use the logstash persistent queue, I choose to use Kafka as message queue.
I send everything to a Kafka cluster and Logstash will consume from the topics in Kafka.
If for some reason I can't send directly to Kafka, I will ship to logstash into a pipeline that will only ship to kafka, then another pipeline will read and send to elasticsearch.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.