I have 10 remote sites that I need to backhaul the logs to a central site for processing. I was going to have logstash at each of the remote sites and use lumberjack to forward the logs to the central site. The question I have is that I need to keep each site's logs isolated from one another.
Has anyone seen a configuration for accomplishing this type of thing ? My brute force method is to just stand up 10 logstash instances at the central site and do a 1:1 mapping, but if I can be more efficient I would certainly welcome the feedback.
In each one of the remote sites you will need to install and configure filebeat to read the logs that you want and push those logs to the central site with logstash, this way you only need one instance of logstash.
What will be the logstash output? If you also need to keep the output isolated based on the origin server, you can add tags in the filebeat configuration and use logstash filters.
We want to use logstash on the remote sites instead of filebeat, but I am just wondering if anyone has done a one to many logstash aggregation scheme while at the same time keeping the remote sites information from being combined with other sites. I know the 1:1 will work was just hoping to be able to a more effective way to receive the logs at the central site.
Well, you can use the same approach that I suggested, you can add a tag that identifies the source in each one of your remote logstash and based on those tags, use only a central logstash to filter them and keep them without being combined.
For example, remote server 01 could have a tag named 'server01' added during your remote logstash pipeline and when your central logstash receive the messages, it can filter based on this tag and send the events to an separated output.
In your remote logstash you will need something like this in your filter block
filter {
mutate {
add_tag => "server01"
}
}
And in your filter and output blocks on the central logstash you have something like this.
filter {
if "server01" in [tags] {
# other filters
}
}
output {
if "server01" in [tags] {
# output to file, elasticsearch, etc that will only apply to events from server01.
}
}
This way you have many remote logstash instances pushing data to one central logstash instance that will keep those data separated based on the source using tags.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.