Just looking for a starting point on handling a DMZ firewall. I saw the guide on using logstash but it looks like it resides in the DMZ and pushes data to the cluster residing in our protected network. This won't work for us as policy does not allow for any system in the DMZ to initiate a connection to the protected network.
What is the most reliable way to handle this scenario? And are there how-tos I can review? I imagine the solution would be beat agents would still push to a logstash server residing in the DMZ, then a logstash server on the protected network would initiate the connection and "pull" the data collected on the logstash server residing in the DMZ?
Just looking for some experienced analysts to point towards the most reliable design.
Logstash does not store data, so you cannot have a logstash pulling data from another instance.
What you need is a message broker like Kafka to store the data received by one logstash and then configure the other logstash to pull the data from this Kafka cluster.
Something like this:
Agents --> ( Logstash -> Kafka Cluster ) <-- Logstash --> Elasticsearch.
Your agents would send your data to a Logstash in the DMZ that would then output it to a Kafka cluster also in the DMZ, then you would have another Logstash outside the DMZ using the Kafka input to consume those messages and send them to Elasticsearch.
Thank you for the help. Just to double check, I cannot have beat agents directly output to kafka? Also, there any good walkthroughs you would recommend installing and configuring kafka?
Yes, you can send directly from beats to Kafka, but sending to Logstash will give you more flexibility if you need to send the data to different topics, it is your choice.
I do not have any walkthrough about Kafka, but it is pretty easy to find it somewhere.
Thank you again! Last question, do I need zookeeper? If not, I am thinking I still need a Kafka cluster for failover. I think I can just configure yml to list all nodes / brokers in the cluster. Then on the protected network, pull with logstash with a second one as backup. Again the logstash pulls do not need zookeeper? - something like this?:
Zookeeper is used by Kafka to synchronize data between the brokers in the cluster, it has no relation to Logstash.
Currently on newer versions of Kafak you can run it without Zookeeper, you need to configure it to use the internal raft (kraft) to synchronize the brokers.
This is unrelation to Logstash, if your Kafka cluster is running it will be able to consume from it, how you will implement your Kafka cluster is out of the scope of this forum, but there are plenty of tutorials on how to spin-up a Kafka Cluster with or without ZooKeeper, look for something like "Running Kafka cluster with KRaft"
Thank you for the info, a huge help!