Logstash 6 - Multiple Pipelines for One Input

MarcusCaepio · November 16, 2017, 12:05pm

Hi all,
Currently in Logstash 5.x I have one Input (Beats) about which I get many different kinds of logs.
To grok this logs, I have many config files with filter like
"if [type] == "yxz....." defined.
The results is one large pipe with many if checks and filters...so large, that in my past elk6 beta tests, kibana was not able to display the pipe.
Now with the support of multiple pipes in ES6, I want to separate these many kinds of logs to different pipes.
For explanation, let's say we have the following env:
On a Server I have Filebeat configured with 2 prospectors: syslog for /var/log/syslog* with tags: syslog and authlog for /var/log/auth* with tags: authlog. What I now tryring to do is, to let logstash filter this in 2 different pipes, although both come over the same input (beats).

What I tried till now:

define 2 pipelines, both with input beats (same port), filter, output. Ends in an error "address already in use"
define 1 pipe with only input beats and 2 pipes only with filter and output for syslog and authlog. Seems that the logs don't arrive any of the 2 filter pipes.

Got anyone an idea, how I can solve this? In the end there should be different types of logs with their own pipeline. With this method I want to prevent, that a configuration failure in one filter, blocks the whole pipeline and with it the processing of the other logs. E.g. if I change sth. in the syslog filter and make a mistake, only the syslog pipeline should stop. authlog pipeline should run farther

Thanks in advance!

theuntergeek · November 16, 2017, 1:49pm

There is no easy way to handle this with multiple pipelines right now without using a broker. The easiest way I can imagine would be to have a single Redis instance, with a different db for each separate pipeline you want to use.

Pipeline 1

input {
  beats { }
}
output {
  # Each redis output/destination will get a copy of the data from the beats input
  redis {
    db => 1
  }
  redis {
    db => 2
  }
  # insert as many redis outputs as needed
  redis {
    db => 3
  }
  redis {
    db => 4
  }
  redis {
    db => 5
  }
}

Each other pipeline reads from a different `db`

input {
  redis {
    db => 1
  }
}

filter { 
  # your individual pipeline filters go here
}

output {
  # your targeted output (e.g. elasticsearch) goes here
}

This approach multiplies your data into each Redis db, but it allows data to continue to flow if you shut down any of the down-stream pipelines reading from Redis. Data will buffer there until the pipeline comes back up.

Be sure to monitor the list length in the Redis dbs to ensure that events not getting processed are not spooling up and filling up memory and disk space forever.

MarcusCaepio · November 17, 2017, 5:34am

Hi Aaron,
Ugh, that's bad. I don't want to use redis anymore. And I hoped I don't need this "neverending" if then if then futher more.
So what is the use-case of the mutliple pipelines, when I cannot split my inputs into them?
I think most of us will get their logs over the same inputs?
If I have multiple inputs and define multiple pipelines for them, how do I make sure, that the right filter are used, when I save the filter in separate configfiles?

Or is there a way to send data from one pipeline to another pipeline internally?
So I could define one general input pipeline and send it to other specific pipelines ?
Then I could define, that "thiskindoflogs" pipeline should run with persisted queue, but "thatkindoflogs" pipeline should not and so on

Christian_Dahlqvist · November 17, 2017, 8:29am

There is an issue in GitHub discussing different options for this, where you can weigh in if you want.

guyboertje · November 17, 2017, 12:08pm

My 2 cents. The point behind multiple pipelines is to separate different flows into their own configs and run them as separate pipelines.
A flow as I see it, is a distinct message format like apache access logs vs apache error logs.
The flow separation in LS starts with the input meaning that you should have a dedicated input per flow.
The upstream beats need to continue with the flow separation, they should talk to the port configured in their flow config.

MarcusCaepio · November 17, 2017, 12:39pm

Then how would your flows look like, when e.g. you want to seperate syslog, authlog, dpkg, apache_access, apache_error logs from XXXX server, all shipped by filebeat?

guyboertje · November 17, 2017, 2:13pm

You would run multiple instances of filebeat with different configs (paths and logstash host:port).

theuntergeek · November 17, 2017, 9:18pm

Guy has already stated this clearly, but I feel to reiterate. Multiple pipelines are best with multiple input plugins or sources. Your single filebeat is a single input plugin, and Logstash sees it as a single data source. If you're ingesting multiple different filetypes in a single filebeat, and don't want if/then conditional flow control, then Guy's recommendation is best: run multiple beats instances with a single file type per instance.

MarcusCaepio · November 17, 2017, 9:56pm

I am curious, who has more different input types than different log types, which are shipped over the same input type, to use the multiple pipes, like they are usable now? And who wants to monitor X instances of filebeats/logstashes on x server? Before filebeat, we all had the logstash shipper. Now we got the filebeat shipper. it's not the technically difference of data, we got. It's the organisationally difference of data, we got. so we have a syslog, an authlog, an apache error_log, an apache access log and so on. But all of these are technically logs. And they come over filebeat (or logstash shipper). So we should have the possibility to seperate this logs organisationally in separate pipes directly and not only technically, because technically they are logs shipped by filebeat. And I don't think, that a neverending "if else" condition is the most practical way, when you have XXX different types of logs, because you live in an environment, where you have XXXX different kinds of applications.
The simplest solution would be, to make it possible that X pipelines can use one input and check if they should handle this data or drop it. So every pipeline lives for its own and just waits for its kind of input.

E.G. the sup-pipes mentioned in the git issue

theuntergeek · November 18, 2017, 4:35pm

Something has to separate these different log types.

This is a kind of false equivalency. Something still has to differentiate the logs. Logstash uses if/then to recognize the different log types which can be tagged within beats (and should be for the if/then to work properly).

It cannot work this way without duplicating the data. Somewhere in the pipeline, it must send a copy of the entire stream to these other inputs which will "drop" based on what? An if/then statement of some kind (whether it's internal to the code, or in the Logstash pipeline definition, it's still there). Yes, even the sub-pipelines feature will do the thing I'm describing: duplicating data, and sending it to different "internal" inputs. At which point, an if/then will determine whether to keep or drop something.

All you're doing is further abstracting the data model, a point we agree can be useful. But it doesn't change the fact that a similar number of if/then statements will at some point be necessary within that broader collection of pipelines (sub or otherwise) to say, "process this, but not that."

Your use case is clearly log file centric. You're shipping many different kinds of log files with a single instance of beats, and this is a very efficient approach. But not everyone does this. Many people have centralized Logstash setups which are receiving from multiple TCP, UDP, beats, and many other inputs. This is the use case where the multiple pipelines truly shines, as they can receive from those other sources in individual pipelines now, instead of having to use dozens of lines of if/then statements, or multiple instances of Logstash to try to isolate them. While we value the end game of solving your particular use case, it does not devalue the current addition of multiple pipelines.

system · December 16, 2017, 4:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multiple filebeat inputs and multiple logstash pipelines Logstash	5	2459	September 9, 2020
Multiple pipeline in logstash listening to different beats input ports Logstash	1	1704	January 2, 2019
Multiple beats with different filtering Logstash	1	319	July 4, 2018
Logstash 6.0.0 multiple pipelines not working Logstash	4	1459	January 11, 2018
Filebeat inputs configured to output to multiple logstash pipelines Beats filebeat	4	910	March 25, 2019

Logstash 6 - Multiple Pipelines for One Input

Pipeline 1

Each other pipeline reads from a different db

Related topics

Each other pipeline reads from a different `db`