Scaling beats(filebeat at this moment) ~5k servers

lightkz · July 20, 2016, 4:57pm

Hi Beats folks,

We just recently started playing with filebeats and see a lot of usefulness. But we got a bit stuck with scaling, configs management, and some limitation(by design https://github.com/elastic/beats/issues/1112) for outputs.

Our pipelines look like these.

server -> kafka -> logstash -> elasticsearch

or

server -> kafka -> samza -> elasticsearch

For delivering and deploying we are using puppet. So it make sure it pushes and installs filebeats. As for config management we have inhouse developed framework(cover all our apps), which require substantial changes to accommodate filebeat deployment. We POC it and it is working but has some limitation on metadata to support multiple topics for multiple filebeat processes on the server.

I was wondering how others scale beats in env, assuming we have multiple files on each server that we need to filebeat and deliver to multiple different topics(lets say only to one broker for now)?

steffens · July 20, 2016, 7:49pm

I don't fully understand the actual configuration problems you're facing. More important, what exactly you want filebeat todo.

Sending to kafka you want to push to multiple topics? One case use output.kafka.use_type and filebeat.prospectors.X.document_type, to configure different topics per prospector-type. Support for choosing topics might be enhanced in future versions.

Filebeat supports environment variable for changing settings + some configurable 'config'-directory. The directory support allows you to put multiple prospector configurations into one directory. e.g. use puppet on machine type to put some config per service into config directory. After restarting filebeat the prospector configs are merged with main filebeat config.

Recent nightly builds support:

load multiple config files by using -c <file> option multiple times
overwrite any config setting from command line using -E <setting>=<value>.

lightkz · July 20, 2016, 8:57pm

I was not sure how to push different events to different topics per prospector. For example app generates three different files and I would like to push those files to different topics. From what you are saying we need to do this.

filebeat.config_dir - would have all our prospectors configuration.

main config file for output would have something:

output.kafka:
hosts: ["broker"]
use_type: true
compression: snappy

and our prospectors would be look like this:

filebeat.prospectors:
document_type: topic_name?

input_type: log
paths:

"/blah/blah/log_file"

and then star would look like filbeat -c main_config.yml?

or another way I will run multiple times filebeat with different config files.

andrewkroh · July 21, 2016, 10:55pm

@lightkz, sounds right to me. But the document_type option is part of each individual prospector in the array. So just specify it like so:

filebeat.prospectors:
  - input_type: log
    paths:
      - "/blah/blah/log_file"
    document_type: topic_name

lightkz · July 22, 2016, 7:19pm

@andrewkroh, ehh... I see. Thank you so much!

That will solve a lot of our deployment strategy.

system · August 10, 2016, 4:58pm

This topic was automatically closed after 21 days. New replies are no longer allowed.

Topic		Replies	Views
[suggestion] Filebeat prospector configuration template Beats filebeat	5	865	November 1, 2016
Working with filebeat Beats filebeat	14	2320	July 5, 2017
Send logs on a specify output depending on the paths of the prospectors Beats filebeat	5	1307	January 26, 2017
Multi Filebeat prospector paths and Filebeat pipelines in one Logstash config file for ES/Kibana Elasticsearch	5	1040	February 12, 2019
Send beats from different prospectors to different outputs? Beats	2	643	November 23, 2016

Scaling beats(filebeat at this moment) ~5k servers

Related topics