I have a pipeline that has been configured to use the syslog plugin as its input data source. And I would like to enable the Persistent Queue feature to prevent potential data loss due to a crash on Logstash. However, when I came across the official docs, it looks like PQ only supports input like Beats and HTTP.
From PQ docs:
"Input plugins that do not use a request-response protocol cannot be protected from data loss. Tcp, udp, zeromq push+pull, and many other inputs do not have a mechanism to acknowledge receipt to the sender. (Plugins such as beats and http, which do have an acknowledgement capability, are well protected by this queue.)"
I would like to confirm if syslog plugin is supported, and what would be an efficient way to verify this?
By the way, since all requests will be queued locally and the rest is just dealing with the output, why would such a "request-response" feature be required for PQ? Is it just to notify the data source to retry in the case of failing to queue the log?
The syslog is basically a tcp or udp input with some built-in grok to parse the message, so I would expect it to have the same limitations as the tcp or udp inputs.
You can still use persistent queues with the syslog input, but as already said in the documentation, it will not prevent data loss in this case.
If logstash crash or the queue if full, the events will continue to be sent from the source, but will be dropped in the logstash server.
This does not happens with the beats input for example as when Logstash is down or if the queue is full, beats will keep trying to send the events.
I may assume that PQ would still prevent data loss from in-memory events that have been successfully delivered to the logstash server, correct? It's just the data source wouldn't be aware of the status of logstash(either crash or queue is full). It will just keep pumping events to logstash.
One more question, in the case of failing to deliver the event to Elasticsearch, which means that the event is not marked "ACKed", how would the queue handle this? Does it just put it back the queue and keep trying? Or it will drop it and I would need to utilize Dead Letter Queue to protect this? I see a little bit overlapping here between PQ and DLQ.
Persistent queues sit between the input and the filter stages in the pipeline. If the pipeline backs up then the queue can transmit the back pressure into the input and tell it to stop accepting events.
For a beats input the protocol allows that to further transmit the back pressure to the beat and tell it to pause sending events.
With a udp input there is no way to tell the source to stop sending events.
For a tcp input the transmission window should close, which would force the source to stop sending. I do not know whether it is true that a PQ does not protect data sent over TCP. It seems odd to me that it would not, but I have not checked the code base.
Once an event is accepted into the PQ the input no longer has to concern itself with it. When it gets from the filter stage to the output the DLQ comes into play.
In theory several output plugins could support DLQs, but I believe only the Elasticsearch output does so.
"failing to deliver the event to Elasticsearch" is a pretty complicated subject. This, and the threads it links to, might be helpful (or may just confuse you!)
If for some reason you are using an http output to talk to Elasticsearch then this may be useful.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.