We currently have a web cluster in Azure which streams HTTP logs from HAProxy over syslog to an Azure hosted logstash instance. We then have two outputs, the 1st writes the logs to an Azure storage bucket - which is reliable - and the 2nd to an onsite ELK cluster - which is relatively unreliable.
It seems that if the 2nd output to ELK fails, it also stops writing to the 1st, and eventually the Logstash buffer fills up and we start losing logs. We want the 1st output to file to be reliable because we can always reconstruct the ELK output from these files if needed.
What is the best way to make the ELK output "best effort" but not to block the pipeline if it fails? We've tried having a 2nd Azure logstash instance to forward to ELK, and then and ship the logs over UDP so in theory the 1st shouldn't know if they are getting dropped which seems to work but is very clunky!
You need some kind of multi-pipeline setup (single Logstash instance with multiple pipeline or multiple instances), probably with a message broker in between. For example, you could have one Logstash pipeline that receives the messages via HTTP and just shoves them to a broker. Two other pipelines read from queues in that broker and ship to their respective outputs.
Thanks for the answer. That's kinda what we do but shipping the logs over UDP to the 2nd instance, it just makes things more complicated than I'd like. I'm guessing there is no way to set the configuration so that once a log is written to one output, it doesn't block the pipeline if other outputs fail?
I'm guessing there is no way to set the configuration so that once a log is written to one output, it doesn't block the pipeline if other outputs fail?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.