Using Pipelines or not .. a shakespearean dilemma!

I have two different applications running on two different servers, shipping logs and exposing server API information to Logstash.

The output from each application has a different data rate. Let's say:

  • App1 sends data @ 1x rate
  • App2 sends data @ 15x rate

Each app sends 4 different types of logs (json/log4j, different fields, etc.).
Also the shipping method for the 4 datasets is different, 2 come from filebeat and 1 from the jdbc_input plugin.
Mixing the data from the two apps is not a current priority.

Currently, for App1, I have 9 logstash configuration files:
4 datasets x 2 files (input, filter) + 1 common output file = 9 config files

Namely:

  1. Filebeat --> Grok --> Elasticsearch
  2. Filebeat --> Grok --> Elasticsearch
  3. JDBC Input (Postgres) --> Mutate/Replace --> Elasticsearch
  4. HTTP Plugin (public API) --> N/A --> Elasticsearch

I wonder what's the best practice for onboarding App2.

Some of the questions I have, are:

  • How can I develop my logstash configuration for App2 without impacting the operation of App1, i.e. not having to restart logstash while I develop my grok filters?
  • Should I be using multiple pipelines? If so, should I join all config files (input, filter, output) into one or do pipeline-to-pipeline (in beta)?
  • If I'm running a single node (which I am), what's best from a performance perspective? (introductory note seems to argue that pipelines are the way to go in this case)
  • Should I use 1 pipeline per data set or per application/server (data source)?
  • How should I think about tradeoffs between complexity of pipelines, performance and ability to keep onboarding new datasets from new apps?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.