Multiple FileBeats

Hi all,

Apologies if this is a really dumb question, but been reading so much think I am getting myself confused.

I have a filebeat agent running on a machine and its reporting back to my ELK stack server. When I had a single pipeline (main) with Logstash on the default port 5044 it worked really well.

I now have added multiple filebeat.yml's with different configs. When Filebeat starts up it loads all the configs. I noted in the documentation that you still have to have a base filebeat.yml to allow you to specify the conf.d with the additional files.

So I have:

filebeatdir >conf.d> filebeat_somelog.yml filebeat_someother.yml and filebeat_file.yml.

All is good. Filebeat starts up loads the configs and I can see it parsing the different inputs as specified in the different configs.

Question

Now I have multiple pipelines basically one for each of those configs. I am probably wrong but in each of the input files for logstash I specified a different port for each input. Thinking this is how that specific beat would talk to logstash. I am just lost on how to specifically tell the filebeat to go to a specific pipeline (the pipelines are being created correctly in elasticsearch).

What's happening now is all the filebeats outputs are ending up in the Main pipeline and ofcourse the filter for that pipeline is not filtering correctly.

So for example:
filebeat_somelog.yml > Pipe 1 (with Filter 1)
filebeat_someother.yml > Pipe 2 (with Filter 2)
filebeat_file.yml. > Pipe 3 (with Filter 3)

I am sure its probably obvious but you know when you read so many different things you get so confused? Would really appreciate the help.

As you describe it, you don't have multiple filebeats running, but only one. The conf.d is about providing input configurations via external files only. This is about making config management a little easier (a filebeat.yml can become pretty bloated).

Instead of running multiple filebeat + Logstash with multiple ports, you can forward events to respective pipelines using conditionals. E.g. inputs in filebeat have a pipeline setting. This setting is used for selecting an Elasticsearch Ingest Node pipeline. I like to use the setting for sending to Logstash as well. If set the beat will send the pipeline name in [@metadata][pipeline] to Logstash. Here it can be used for filtering. Logstash drops the [@metadata] when publishing events to it's outputs.

Thank you for explaining it. This makes perfect sense and I thought there must have been a way to deal with it in this way.

Just for some more clarity if you dont mind?

  1. Does the pipeline config go under the -type & path structure or as a global config (I.e. Where you have filebeat.config.

  2. On ports can I use for example the same default logstash port 5044 for all the filebeats to send to Elasticsearch?

  3. If I understand you correctly I would then have one input file on the logstash side that would have something like:

    input { beats{ if [pipeline] = Foo
    Do something?

Now on the input above would I have host / port set differently for each incoming beat?

On the output side I would think it would all be a single output as the flow should be beats> Logstash input> pipeline with filter > elasticsearch.

Sorry for all the questions just trying to clarify in my mind.

Reading the documentation it states that the pipeline setting is only for elasticsearch output and not the logstash output:

  output.elasticsearch:
  hosts: ["localhost:9200"]
  pipeline: my_pipeline_id

Did I read your post incorrectly above that you can use it for the Logstash output?

It's purpose is for being used with Elasticsearch. But beats can publish events to Logstash/Redis/Kafka as well. Normally one uses Logstash to finally index values into Elasticsearch. The @metadata field is published to all outputs, but Elasticsearch. It contains all information needed to index events into Elasticsearch as the beat would have done it. No one prevents you from abusing the "information" in Logstash :slight_smile:

Does the pipeline config go under the -type & path structure or as a global config (I.e. Where you have filebeat.config.

It's a per input setting. Use after - type: ...

  1. On ports can I use for example the same default logstash port 5044 for all the filebeats to send to Elasticsearch?

Filebeat suports one output only. You must use 1 port.

If I understand you correctly I would then have one input file on the logstash side that would have something like:

Yes and no. I don't think you can use conditionals in Logstash. But you can do this in filters (e.g. create one file with filters per pipeline):

filters {
  if [@metadata][pipepline] == "somelog" {
     mutate { remove_field = ["[@metadate][pipeline]"]}  # remove `[@metadata][pipeline]`   # remove @metadata.pipeline, so to not forward processed events to Ingest Node pipelines

    # IMPLEMENT ME: actual filters for "somelog"
  }
}

And in outputs one can do:

output {
  if [@metadata][pipeline] {
    elasticsearch {
      ...
      "index" => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+yyyy.MM.dd}",
      "pipeline" => "%{[@metadata][pipeline]}"
    }
  } else {
    elasticsearch {
      ...
      "index" => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+yyyy.MM.dd}"
    }
  }
}

By filtering+removing the pipeline in Logstash, but still forwarding unprocessed pipelines to Elasticsearch Ingest Node, you can make use of both, processing in Logstash and Ingest Node. Use cases:

  • if you want to reuse filters written against Ingest Node (Logstash acts as Proxy for some data sources, yet can filter/process others).
  • Migrate Ingest Node users to Logstash (one pipeline at a time)
  • Migrate from Logstash to Ingest Node (one pipeline at a time)

Alternative to conditionals you can use pipelines in Logstash as shown here (forwarding to pipelines is still in beta): Pipeline-to-pipeline communication | Logstash Reference [8.11] | Elastic

In Logstash pipelines have inputs, filters, and outputs. And therefore it's own set of workers. This allows you to create more concurrent workers for some log-sources, but also increases management overhead on your side.

Thank you @steffens - this is the best and most comprehensive response that I have had on this forum.

So it looks like I am going to re-architect look at my solution as I have done it incorrectly:

  1. I have separate output files for each of the filebeat file inputs coming in. So Example:
    Suricata logs - 10_suricata_output.conf (in the Suricata conf.d on logstash)
    Snort logs - 10_snort_output.conf (in the Snort conf.d on logstash)
    Firewall logs - 10_firewall_output.conf (in the normal conf.d on logstash)

  2. I have the same for the input files on logstash i.e. I have a 01_<log_Service>_input.conf in each of the relevant conf.d folders for each of the services.
    Then on each input I have a different logstash port listening so e.g Suricata input listening on 5444, Snort on 5744 firewall on 5844. To separate the coms from each service.

  3. To further in my head "simplify" which seems to be complicating is I have separate pipelines for each of the log services so I have:

snort pipeline - pointing to the snort conf.d for a specific snort filter
suricata pipeline - pointing to the suricata conf.d for a specific suricata filter
Firewall pipeline - using the normal logstash conf.d with a filter file.

This is where I was getting highly confused as I had specific pipelines for each of the log inputs with the relevant filters for those pipelines and thinking that having different ports for each of the log inputs would separate them (along with their relevant input and output configs)

All that seemed to happen is I never knew which pipeline the data was going into a different pipeline every-time I restarted log and then for some reason no index was being created.

------------------------------- If I understood you correctly the right approach would be----------------------------------

I did this to "simplify" each service and know where each of the confs were. The way I am understanding your post above -

  1. Add the specific pipeline option under the - type section in the beats input file on the server
  2. Migrate all the logstash inputs from a single input per log service in their relative conf.d into the single logstash input file in the logstash conf.d - with the relevant conditions.
  3. Just not sure if I could still use the filter for each of the pipelines as individual filter files (they long and complex and would complicate / bloat the input file. Or if I have to keep them all in the logstash input file.
  4. I collapse all the current ports to only use the default port 5044 since its in the same input file and as you indicated I must use one port.
  5. I then use the conditions on the output file (I assume I must collapse all my current single outputs in to the one logstash output file in the conf.d dir).

this should then insure that the logs go to elastic via logstash pipeline(s) with their relevant filters?

Did I understand you correctly -
Once again thank you for the time and guidance as well as the link for me to read

I see how your initial configuration works and I've seen others running a similar setup. It is a viable option, but I find it somewhat hard to manage. Your initial setup has one advantage, though. If one of the pipelines becomes stalled or is slower, it will create back-pressure on the filebeat sending the particular log only. Collection of the other files will not be affected. Also higher level of concurrency (higher resource usage) due to decoupled pipelines, end to end. It's a trade-off.

  1. Add the specific pipeline option under the - type section in the beats input file on the server

right

  1. Migrate all the logstash inputs from a single input per log service in their relative conf.d into the single logstash input file in the logstash conf.d - with the relevant conditions.
  2. Just not sure if I could still use the filter for each of the pipelines as individual filter files (they long and complex and would complicate / bloat the input file. Or if I have to keep them all in the logstash input file.

I don't think you can use conditions in the input section. As the filter sections are independent, I would create one file for input + one file per 'pipeline'. e.g.:

000_input.conf
100_suricata.conf
100_snort.conf
100_firewall.conf
200_output.conf

The 100_xxx config files will have the filters + the condition for each of the configured sources. Logstash will load the files in order and merge them into the final filter pipeline in logstash. As we put the guard (condition on pipeline name) at the top of each filter file, the loading order is no problem.

  1. I collapse all the current ports to only use the default port 5044 since its in the same input file and as you indicated I must use one port.

Yes. Only one port to configure in your firewall.

5 I then use the conditions on the output file (I assume I must collapse all my current single outputs in to the one logstash output file in the conf.d dir).

Right. We only need the condition, so to ensure we use the pipeline setting only if [@metadata][pipeline] is still set in an event. If you do not want to use Ingest Node at all, but do all filtering in Logstash, then you won't need the condition. Just don't set the pipeline setting.
If you want to set the pipeline in your output, but don't want to maintain the condition in the output (extra set of output workers), you can set [@metadata][pipeline] to an empty string in the filter section, if [@metadata][pipeline] is missing.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.