Scaling beats(filebeat at this moment) ~5k servers


(Askar) #1

Hi Beats folks,

We just recently started playing with filebeats and see a lot of usefulness. But we got a bit stuck with scaling, configs management, and some limitation(by design https://github.com/elastic/beats/issues/1112) for outputs.

Our pipelines look like these.

server -> kafka -> logstash -> elasticsearch

or

server -> kafka -> samza -> elasticsearch

For delivering and deploying we are using puppet. So it make sure it pushes and installs filebeats. As for config management we have inhouse developed framework(cover all our apps), which require substantial changes to accommodate filebeat deployment. We POC it and it is working but has some limitation on metadata to support multiple topics for multiple filebeat processes on the server.

I was wondering how others scale beats in env, assuming we have multiple files on each server that we need to filebeat and deliver to multiple different topics(lets say only to one broker for now)?


(Steffen Siering) #2

I don't fully understand the actual configuration problems you're facing. More important, what exactly you want filebeat todo.

Sending to kafka you want to push to multiple topics? One case use output.kafka.use_type and filebeat.prospectors.X.document_type, to configure different topics per prospector-type. Support for choosing topics might be enhanced in future versions.

Filebeat supports environment variable for changing settings + some configurable 'config'-directory. The directory support allows you to put multiple prospector configurations into one directory. e.g. use puppet on machine type to put some config per service into config directory. After restarting filebeat the prospector configs are merged with main filebeat config.

Recent nightly builds support:

  • load multiple config files by using -c <file> option multiple times
  • overwrite any config setting from command line using -E <setting>=<value>.

(Askar) #3

I was not sure how to push different events to different topics per prospector. For example app generates three different files and I would like to push those files to different topics. From what you are saying we need to do this.

filebeat.config_dir - would have all our prospectors configuration.

main config file for output would have something:

output.kafka:
hosts: ["broker"]
use_type: true
compression: snappy

and our prospectors would be look like this:

filebeat.prospectors:
document_type: topic_name?

  • input_type: log
    paths:
    • "/blah/blah/log_file"

and then star would look like filbeat -c main_config.yml?

or another way I will run multiple times filebeat with different config files.


(Andrew Kroh) #4

@lightkz, sounds right to me. But the document_type option is part of each individual prospector in the array. So just specify it like so:

filebeat.prospectors:
  - input_type: log
    paths:
      - "/blah/blah/log_file"
    document_type: topic_name


(Askar) #5

@andrewkroh, ehh... I see. Thank you so much!

That will solve a lot of our deployment strategy.


(system) #6

This topic was automatically closed after 21 days. New replies are no longer allowed.