Is it better to have one pipeline or multiple pipelines?

Hello

From the start, Ive implemented the Elastic Stack using Logstash as the reciever and sender of logs to Logstash.

Ive always implemented it using various pipelines. Each pipeline is organized by a different configuration file.

This causes me on the Elastic Stack server (one node) to have to open a port for each of my pipelines and configurations.

One case that I have been discussing with a coworker are syslog files.

Since each provider sends syslogs in a different format, I have a configuration file for each one, filter them as needed and output to a Elasticsearch index.

The other way this could be done is having one file, listening on just one port and inside, add a lot filters so each one is mutated in its own way. Similar, in the output section, I send them to each index as needed.

Personally, I see a huge configuration file as confusing and hard to manage. This is while I seperated it.

But, I do want to know: Is this wrong? Should I just stick it all in one file?

Thank you

I think is why you have IF conditionals and filters... you need something to distinguish each log from each other and route them with the IF statements.

The advantage of multiple pipelines is that the pipelines are completely isolated from one another, so your events won't mix up if you forgot a conditional or things like that.

Multiple pipelines is a feature that was implement on version 6.X, before that to run different pipelines you needed to run different logstash instances or have a lot of conditionals in one big file.

In the example you gave where you have many syslog sources with different formats you can try to use the pipeline-to-pipeline communication, this way you would have only one input and use conditionals to direct the messages to other pipelines.

For example:

input {
    udp {
        port => 5514
    }
}
output {
    if "stringA" in [message] {
        pipeline {
            send_to => "pipeline1"
        }
    }
    if "stringB" in [message] {
        pipeline {
            send_to => "pipeline2"
        }
    }
}

Then you would need two other pipelines, pipeline1 and pipeline2 with the following format.

input {
    pipeline {
        address => "pipeline1
    }
filters { ... }
output {...}

This way you have just one input, just one port listening, and you can isolate your pipelines using the internal communication between pipelines.

If you want to keep everything in just one pipeline and use conditionals, you can better organize your pipeline if you split it in multiple files.

For example:

000-input.conf
100-filters-syslogA.conf
110-filters-syslogB.conf
120-filters-syslogC.conf
...
XXX-filters-syslogN
999-output.conf 

Then each one of the filters file would have the following format:

filter {
    if condition to isolate events { 
        your filters
    }
}
2 Likes

Hello

Thank you for your explanation. I also did not know that pipeline to pipeline communication was possible.

My files are a bit different; I do the input, filter and output all in the same file. Is this incorrect/worst/etc?

It makes no difference for logstash, it is more about how you want to organize your pipeline.

When logstash starts it will merge all the input, filters and output for your pipeline, so it doesn't matter if they are in one file or in multiple file.

The use of multiple files helps when you have really big pipelines, so you can edit just an specific part.

Understood.

So, in your opinion, is it better or worst to have several Logstash syslog configuration files each listening on different ports or just one?

It is your choice, there is no wrong or right, nor better or worst.

It depends entirely on your use case and how you want to organize your pipelines, some people prefer to have one input for each source, other people prefer to have just one input and use conditionals to filter, it is a personal choice.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.