The reasons to use a broker like Redis in the ELK log pipeline are (please correct me if I'm wrong):
Offload the shipper's (Filebeat or Logstash) internal queue quickly
Prevent the Elasticserch from being overwhelmed during high activity peaks
Protect from temporary network outages between shippers and Elasticsearch (especially if it is hosted on a different DC / cloud)
I also read here and there that Redis might not be required when Filebeat is used (from Filebeat's official page: "It is intelligent enough to deal with [...] the temporary unavailability of the downstream server, so you never lose a log line.").
However it is unclear for me how exactly it does that, what are mechanisms, what are the limitations, how that can be configured / fine-tuned (I see no options for that in the default config file).
Is it really worthy considering not to use a broker, if using FIlebeat?
Personally I still would as it allows you to decouple the pipeline to do downstream maintenance, and be aware of any surges/increases in activity. Otherwise unless you are monitoring your beats logs, how will you know if there are backlogs?
filebeat uses a registry file. This file is used to remember files and file offsets being ACKed by upstream (elasticsearch/logstash). E.g. when restarting filebeat, filebeat continues where it left off. If elasticsearch/logstash becomes unavailable publishing lines is retried until elasticsearch/logstash becomes available again. This guarantees at-least-once delivery, no data loss.
Limitation is with log-rotation + deletion of old files. If files are written to faster then can be processed in general or elasticsearch/logstash becomes unavailable and files have been deleted in meantime, data might be lost (files are gone). On linux file deletion does not delete the inode and free the space until all processes accessing the file have closed the file. If log-rotation deletes a file, space is freed on disk only after filebeat has been finished processing the file or filebeat is restarted (this is a general problem with log-file processing, even for other tools).
Normally a broker is not really required for fillebeat, but some use-cases e.g. very fast log-rotaion + limited disk space, limited machine lifetime (e.g. docker-container or virtual machine to be deleted soon) might profit from additional broker/message queue.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.