Logstash 6 - Multiple Pipelines for One Input

Hi all,
Currently in Logstash 5.x I have one Input (Beats) about which I get many different kinds of logs.
To grok this logs, I have many config files with filter like
"if [type] == "yxz....." defined.
The results is one large pipe with many if checks and filters...so large, that in my past elk6 beta tests, kibana was not able to display the pipe.
Now with the support of multiple pipes in ES6, I want to separate these many kinds of logs to different pipes.
For explanation, let's say we have the following env:
On a Server I have Filebeat configured with 2 prospectors: syslog for /var/log/syslog* with tags: syslog and authlog for /var/log/auth* with tags: authlog. What I now tryring to do is, to let logstash filter this in 2 different pipes, although both come over the same input (beats).

What I tried till now:

  • define 2 pipelines, both with input beats (same port), filter, output. Ends in an error "address already in use"
  • define 1 pipe with only input beats and 2 pipes only with filter and output for syslog and authlog. Seems that the logs don't arrive any of the 2 filter pipes.

Got anyone an idea, how I can solve this? In the end there should be different types of logs with their own pipeline. With this method I want to prevent, that a configuration failure in one filter, blocks the whole pipeline and with it the processing of the other logs. E.g. if I change sth. in the syslog filter and make a mistake, only the syslog pipeline should stop. authlog pipeline should run farther

Thanks in advance!

There is no easy way to handle this with multiple pipelines right now without using a broker. The easiest way I can imagine would be to have a single Redis instance, with a different db for each separate pipeline you want to use.

Pipeline 1

input {
  beats { }
}
output {
  # Each redis output/destination will get a copy of the data from the beats input
  redis {
    db => 1
  }
  redis {
    db => 2
  }
  # insert as many redis outputs as needed
  redis {
    db => 3
  }
  redis {
    db => 4
  }
  redis {
    db => 5
  }
}

Each other pipeline reads from a different db

input {
  redis {
    db => 1
  }
}

filter { 
  # your individual pipeline filters go here
}

output {
  # your targeted output (e.g. elasticsearch) goes here
}

This approach multiplies your data into each Redis db, but it allows data to continue to flow if you shut down any of the down-stream pipelines reading from Redis. Data will buffer there until the pipeline comes back up.

Be sure to monitor the list length in the Redis dbs to ensure that events not getting processed are not spooling up and filling up memory and disk space forever.

Hi Aaron,
Ugh, that's bad. I don't want to use redis anymore. And I hoped I don't need this "neverending" if then if then futher more.
So what is the use-case of the mutliple pipelines, when I cannot split my inputs into them?
I think most of us will get their logs over the same inputs?
If I have multiple inputs and define multiple pipelines for them, how do I make sure, that the right filter are used, when I save the filter in separate configfiles?

Or is there a way to send data from one pipeline to another pipeline internally?
So I could define one general input pipeline and send it to other specific pipelines ?
Then I could define, that "thiskindoflogs" pipeline should run with persisted queue, but "thatkindoflogs" pipeline should not and so on

1 Like

There is an issue in GitHub discussing different options for this, where you can weigh in if you want.

My 2 cents. The point behind multiple pipelines is to separate different flows into their own configs and run them as separate pipelines.
A flow as I see it, is a distinct message format like apache access logs vs apache error logs.
The flow separation in LS starts with the input meaning that you should have a dedicated input per flow.
The upstream beats need to continue with the flow separation, they should talk to the port configured in their flow config.

1 Like

Then how would your flows look like, when e.g. you want to seperate syslog, authlog, dpkg, apache_access, apache_error logs from XXXX server, all shipped by filebeat?

You would run multiple instances of filebeat with different configs (paths and logstash host:port).

1 Like

Guy has already stated this clearly, but I feel to reiterate. Multiple pipelines are best with multiple input plugins or sources. Your single filebeat is a single input plugin, and Logstash sees it as a single data source. If you're ingesting multiple different filetypes in a single filebeat, and don't want if/then conditional flow control, then Guy's recommendation is best: run multiple beats instances with a single file type per instance.

I am curious, who has more different input types than different log types, which are shipped over the same input type, to use the multiple pipes, like they are usable now? And who wants to monitor X instances of filebeats/logstashes on x server? Before filebeat, we all had the logstash shipper. Now we got the filebeat shipper. it's not the technically difference of data, we got. It's the organisationally difference of data, we got. so we have a syslog, an authlog, an apache error_log, an apache access log and so on. But all of these are technically logs. And they come over filebeat (or logstash shipper). So we should have the possibility to seperate this logs organisationally in separate pipes directly and not only technically, because technically they are logs shipped by filebeat. And I don't think, that a neverending "if else" condition is the most practical way, when you have XXX different types of logs, because you live in an environment, where you have XXXX different kinds of applications.
The simplest solution would be, to make it possible that X pipelines can use one input and check if they should handle this data or drop it. So every pipeline lives for its own and just waits for its kind of input.

E.G. the sup-pipes mentioned in the git issue

Something has to separate these different log types.

This is a kind of false equivalency. Something still has to differentiate the logs. Logstash uses if/then to recognize the different log types which can be tagged within beats (and should be for the if/then to work properly).

It cannot work this way without duplicating the data. Somewhere in the pipeline, it must send a copy of the entire stream to these other inputs which will "drop" based on what? An if/then statement of some kind (whether it's internal to the code, or in the Logstash pipeline definition, it's still there). Yes, even the sub-pipelines feature will do the thing I'm describing: duplicating data, and sending it to different "internal" inputs. At which point, an if/then will determine whether to keep or drop something.

All you're doing is further abstracting the data model, a point we agree can be useful. But it doesn't change the fact that a similar number of if/then statements will at some point be necessary within that broader collection of pipelines (sub or otherwise) to say, "process this, but not that."

Your use case is clearly log file centric. You're shipping many different kinds of log files with a single instance of beats, and this is a very efficient approach. But not everyone does this. Many people have centralized Logstash setups which are receiving from multiple TCP, UDP, beats, and many other inputs. This is the use case where the multiple pipelines truly shines, as they can receive from those other sources in individual pipelines now, instead of having to use dozens of lines of if/then statements, or multiple instances of Logstash to try to isolate them. While we value the end game of solving your particular use case, it does not devalue the current addition of multiple pipelines.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.