From the start, Ive implemented the Elastic Stack using Logstash as the reciever and sender of logs to Logstash.
Ive always implemented it using various pipelines. Each pipeline is organized by a different configuration file.
This causes me on the Elastic Stack server (one node) to have to open a port for each of my pipelines and configurations.
One case that I have been discussing with a coworker are syslog files.
Since each provider sends syslogs in a different format, I have a configuration file for each one, filter them as needed and output to a Elasticsearch index.
The other way this could be done is having one file, listening on just one port and inside, add a lot filters so each one is mutated in its own way. Similar, in the output section, I send them to each index as needed.
Personally, I see a huge configuration file as confusing and hard to manage. This is while I seperated it.
But, I do want to know: Is this wrong? Should I just stick it all in one file?
I think is why you have IF conditionals and filters... you need something to distinguish each log from each other and route them with the IF statements.
The advantage of multiple pipelines is that the pipelines are completely isolated from one another, so your events won't mix up if you forgot a conditional or things like that.
Multiple pipelines is a feature that was implement on version 6.X, before that to run different pipelines you needed to run different logstash instances or have a lot of conditionals in one big file.
In the example you gave where you have many syslog sources with different formats you can try to use the pipeline-to-pipeline communication, this way you would have only one input and use conditionals to direct the messages to other pipelines.
For example:
input {
udp {
port => 5514
}
}
output {
if "stringA" in [message] {
pipeline {
send_to => "pipeline1"
}
}
if "stringB" in [message] {
pipeline {
send_to => "pipeline2"
}
}
}
Then you would need two other pipelines, pipeline1 and pipeline2 with the following format.
It makes no difference for logstash, it is more about how you want to organize your pipeline.
When logstash starts it will merge all the input, filters and output for your pipeline, so it doesn't matter if they are in one file or in multiple file.
The use of multiple files helps when you have really big pipelines, so you can edit just an specific part.
It is your choice, there is no wrong or right, nor better or worst.
It depends entirely on your use case and how you want to organize your pipelines, some people prefer to have one input for each source, other people prefer to have just one input and use conditionals to filter, it is a personal choice.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.