I’m setting up a POC of ELK running on Azure and I’m trying to figure out an architecture that works for the POC but also may work for future uses. Right now I have ELK set up on a single Azure VM and I’m shipping a few application logs via Filebeats. I chose Filebeat over Logstash agents because there seemed to be some guarantee of at least once delivery for Filebeat while I couldn’t find a similar guarantee for Logstash agents. Is that correct?
Ideally, this Azure implementation will work for both our non-azure applications (that’s currently working though I’m not sure about future performance since I’m not fronting Logstash with a broker) and our future Azure applications.
Does this seem like a good approach? The Logstash book recommends fronting Logstash with Redis but because my ELK stack is in Azure and would need to be accessed over the Internet by on premise applications, Redis doesn't seem like a good option.
I chose Filebeat over Logstash agents because there seemed to be some guarantee of at least once delivery for Filebeat while I couldn’t find a similar guarantee for Logstash agents. Is that correct?
There's no such difference if you use Logstash to ship logs via a lumberjack output/input.
Does this seem like a good approach? The Logstash book recommends fronting Logstash with Redis but because my ELK stack is in Azure and would need to be accessed over the Internet by on premise applications, Redis doesn't seem like a good option.
Redis being a bad option because of the limited authentication options available, or what do you mean?
There's no such difference if you use Logstash to ship logs via a lumberjack output/input.
That is good to know since at some point, I see value in doing some or all of the filtering on client machines instead of having it all done on a central Logstash instance.
Redis being a bad option because of the limited authentication options available, or what do you mean?
Essentially, Redis is not recommended for public internet consumption. So I'm trying to figure out the best approach for performance and reliability. Right now, it doesn't matter, this is all just a proof of concept. But given our current solution for logging, I'm comfortable that it will be a successful proof so I don't want to paint myself into a corner.
If this was all in Azure (or AWS or local or whatever), I could easily stand up Redis or RabbitMQ as a broker. I'm just wondering with the need to ship logs from both Azure and on premise data centers if there is something I should do between the shippers and the central servers to mitigate not having a broker.
If the source of the logs are on-disk files you have a natural buffer and the need for a broker for buffering is not so great. Brokers are also useful for distributing load, but you're not there yet. I'd start without a broker and add one if the need arises.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.