I am running 4 docker containers for ELKB. I have a Filebeat set up to bring in log files from a volume folder and a logstash filter to process the log files (they're of the same format).
I now have another set of log files in a different format. I've created a second pipeline with it's own .conf file and filter.
But this means I have two pipelines outputting to elasticsearch from the same input.. port 5044 - problem 1.
My filebeat.yml for the original set of log files looks like this:
filebeat.inputs:
- type: log
enabled: true
paths:
- /usr/share/logs/**/*.log # docker volume with many logs in folders
multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
multiline.negate: true
multiline.match: after
...
output.logstash:
hosts: ["logstash:5044"]
But the other log files have a different format so the multiline pattern doesn't work - problem 2.
Ok, so I could create a second -type: log section, but I still have the problem that the filebeat.yml contains a single output.logstash to 5044 - back to problem 1.
So my thought was that I need one filebeat per pipeline... but if so, how do I run multiple filebeats on a single docker container. And is this the right thing to do?
Could I use tags and a single pipeline (with if/else in pipeline logstash.conf file)?
Apologies if this question isn't quite clear. I'm fairly new to Docker and Elastic so have had a steep learning curve in last couple of weeks.
I think you are on the right track with creating a second log input, using tags, and if/else in your Logstash pipeline.
Under each log input in your filebeat.yml you can specify an add_tags processor to set a unique tag for that input. Then, in the Logstash pipeline, you can use if/else on the tags to process the events differently.
Thanks Shaunak, so it seems I would have one logstash.conf with if/else which I guess is ok, it just seems to go against everything I've read in logstash so far which urges you to create a pipeline per the Multiple Pipelines documentation
"Using multiple pipelines is especially useful if your current configuration has event flows that don’t share the same inputs/filters and outputs and are being separated from each other using tags and conditionals"
I guess my logs do share the same output, and only really share the same input because it's in the one filebeat.yml, but the actual filebeats.inputs are different and the filters are certainly different.
I think it would be really useful to document this approach in the Elastic docs, as I've seen a few other people asking similar questions but not quite conclusive solutions, and I think it's a common need to have a central ELKB service harvesting logs from many services with different formats (to my novice eyes anyway!)
I think it really depends on the complexity of your Logstash pipeline. If there are a lot of plugins that are common and just a few differences that require the if/else, then perhaps one pipeline is better. But if the differences are greater, then perhaps using multiple pipelines with pipeline-to-pipeline communication might be preferred. So I think it comes down to what you, the pipeline author and maintainer, consider to be clean vs. messy from a readability and maintenance point-of-view.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.