Hi all,
I'm testing a simple Elastic Stack deployment to test Logstash without persistent queues. In this deployment, Logstash only gets data from Filebeats (apache logs). Both Logstash and Filebeat run fine and seem to be really resilient.
Since Filebeat ensures at-least-once delivery, I'm not sure if persistent queues are really needed. Maybe the advantage in this case is about absorbing bursts of events .
I've tried to simulate a failure scenario but I don't know how to force an abnormal termination. Logstash manages to complete the task event if it receives kill -9, kill -2 or the machine is rebooted.
Without persistent queues configured, Logstash will use a small in-memory queue, which could lead to data loss if Logstash crashes. You should be able to test this by doing the following:
Take a file and configure the pipeline to write this to a separate index so you easily can verify the number of events once the file has been completely processed.
Configure Filebeat to read the new file and send it to a Logstash instance without persistent queue configured.
Once you see events being written to the index in Elasticsearch - stop Elasticsearch. This will cause Logstash to queue up events internally and retry sending to Elasticsearch.
Kill Logstash with kill -9
Restart Elasticsearch and Logstash and wait until no more data is processed through the pipeline. Check the number of records that have been successfully indexed and compare this to the number of events in the file.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.