I am trying to implement a cluster of Logstash nodes in AWS using Autoscaling group. I cannot find a clear, reliable metric in the Logstash API that can tell me if the Logstash persistent queue is completely empty (all logs processed) and Logstash can be shut down and the node terminated. I suggest to add a single counter to the API, eg number of messages to be processed. Apologies if this exists already but based on my tests none of the metrics works. I tested sending logs while blocking the elasticsearch output. The in and out counter are reset after restart, the size of the persistent queue on disk is not garbage collected until restart, etc. None of the existing metrics correlates with a queue of log waiting to be processed. Can you shed any light on this?
Persistent queue stats are in the pipeline stats api, but between the time you observe an empty queue and the time you could issue a logstash shutdown, it might have queued new events.
I monitor persistent queue size (in Zabbix) using this api, the size is often reported as 0 even on active pipelines and even after large queue alerts.
Yes, I checked the stats api, but the data seems inconsistent or incomplete.
Process:
Stop output (elasticsearch)
Let the queue grow (few MB)
Stop incoming logs (Filebeat)
At this point I am not clear what metric to use to know how many records are still there to be processed. Disk size and queue size are > 0.
Enable output (elasticsearch)
Let the queue drain
The queue size in bytes is still > 0. The queue folder is still > 0. If I restart Logstash then the queue file buffer is 1KB (garbage collection kicked in). This is the only clue that the queue has been processed, but it requires and additional restart.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.