This is not a question about a specific setup, it's more or less me trying to understand the intended use of Beats / Fleet in distributed setup.
In the past I built several setups where I used Logstash to transport log data from remote sites to a central Elasticsearch cluster. By "remote sites" I mean everything from "cloud to on-prem", remote sites via VPN or just a DMZ that's heavily guarded by firewalls.
I wonder how I could manage that when I replace Logstash with Beats / Elastic Agent and Ingest Pipelines.
I know, I can have a beat write to Kafka which would be a way of transporting data through any kind barrier but how can I read from Kafka without Logstash? Is there a beat I missed or some functionality in Elasticsearch?
Same goes for sending data with beats from heavily guarded internal networks. I often see hosts that are not allowed to connect to the outside world. How could I get beats send data to e.g. Elastic Cloud when I'm not allowed to connect directly? Is there any way I can use proxies that I haven't found?
Same goes for fleet. How can I use it to configure agents that are not allowed to connect to Elasticsearch / Kibana? Is there another proxy mechanism I missed?
Please don't get me wrong. I love Logstash and you couldn't make me happier than by telling me I should just use Logstash. The reason why I'm asking is that I see less and less Logstash love in Elastic Agent and Fleet and I don't want to miss out all the new possibilities just because I want to keep my beloved Logstash.
For Elastic Agent there are two modes that it can be run: Standalone and managed. In the standalone mode you basically have the same as you have with Beats today just all under a single supervisor. Only a connection to the output is needed, you still manage it with local config files. If managed through fleet, a connection to fleet-server is required. This can be run just beside ES or also more local to your Elastic Agents but it always needs a direct connection to Elasticsearch. I have the suspicion that is not possible in your environment.
The goal of Elastic Agent is not the replace Logstash. If you have heavy processing in Logstash and need lots of output options / data duplication etc. Elastic Agent will not help you in these scenarios. There are some components like fleet-server and apm-server inside Elastic Agent which are more "server" components but these are for a very specific purpose.
Going back to your DMZ scenario: It is something we are discussing internally on how we can long term also support it better in managed mode. But keep in mind, Elastic Agent with Fleet just went GA and there is still quite a path ahead of us (more to come ).
Thank you very much. I really missed the Kafka Input and didn't read far enough about the fleet server. Don't ask me how I managed to gave them slip by. Sorry.
While Kafka is definitely not for everyone it is a valid option for providing transport through network boundaries. That's something!
I'm looking forward to seeing more DMZ related config options within Elastic Agent. Sitting in the German speaking parts of the world I can tell you, there're a lot of on-prem installations with with very tight firewall restrictions. I don't want to push any prejudices but as I learned from travelling into other parts, it still feels like there are local flavors in IT. And many bigger companies are still very hesitant to drill holes into their firewalls for logmanagement or move to cloud based infrastructure. So everything that helps getting data from a to b with buffering, TLS and single ports to use is very welcome.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.