How do you manage large filebeat installations?

Elastic has made huge leaps in the direction of centralized management recently. Fleet and agent seem to be coming along. But while they seem to work well for metrics, syslog etc. I'm still struggling a bit with a useable concept to manage a large filebeat set-up.
Let's say you have several hundred windows servers with custom applications, all of them having different configurations. While filebeat installations themselves can be usually managed with whatever endpoint management organizations have available - how are people managing the configurations?

I had a similar use case on a company I worked a couple of years ago.

We had a lot of custom applications and something between 15 and 20 servers and not all the servers would have the same applications, the approach I had at the time was that each application would have its own separated yml file configuration pointing to the logs and adding a couple of custom fields, them the filebeat.yml would point to a folder that had all the configuration files, regardless if the applications were enabled on the server or not.

To manage the individual configuration files we used our internal Git server (BitBucket) and had an automation process that would copy the files to all the server when there was an edit.

The next step would be to use our internal CI/CD tool (Bamboo) to 'deploy' the configurations, since this tool already did the deploy of the applications on all the servers, it could just copy the files when there was a new edit or a new application, but I left the company before this was fully implemented.

So you kept the configuration files for every single machine in version control, then rollem them out via Bamboo? How exactly did you map them, folder per machine?

Every application yml configuration had the path to its log, and the applications always saved the logs in the same place in every machine.

In our case we saved our logs in the following structure in windows:

E:\Logs\ApplicationName\*.log

So, for AppA, the log path would be E:\Logs\AppA\*.log, and every server that had the AppA running, would have this folder.

We then had a filebeat config folder with all the configuration files, that we stored in E:\Filebeat\, so if I had just two apps, AppA and AppB, this folder would have the files AppA.yml and AppB.yml.

In filebeat.yml I would point to the config folder:

filebeat.config:
  inputs:
    enabled: true
    path: "E:\Filebeat\*.yml"
    reload.enabled: true
    reload.period: 10s

The individual app configuration files would be something like this:

- type: log
  paths:
    - "E:\Logs\AppA\*.log"
  scan_frequency: 10s

This is documented here.

Thanks!

Now I'm getting it. So you had all the configurations for all the applications on every machine and distributed them via CI. If the application didn't happen to be on said machine, the folders weren't there and it was ignored. Makes sense to me...

This way, you don't have to differentiate between different machines and check which config needs to be copied. And whatever might need to be added as metadata for a particular machine can be added via add_fields processor in the central filebeat.yml file...

That's it.

I also used some custom fields using environment variables to add metadata, for example every server had a variable with its IP address and its type, something like this:

FBEAT_VAR_IPADDR = 10.0.1.100
FBEAT_VAR_TYPE = database

Then, every app configuration had custom fields.

- type: log
  paths:
    - "E:\Logs\AppA\*.log"
  fields:
    ip_addr: "${FBEAT_VAR_IPADDR}"
    server_type: "${FBEAT_VAR_TYPE}"

So every document for every log would have those two fields to help filtering in Elasticsearch and kibana.

The use of environment variables in filebeat is documented here.

Using a central location for your configuration and environment variables can help you automate filebeat in Windows, I had the same approach on Linux machines, but that automation was way easier.

1 Like

In general, automation on Linux is much easier. Eventually, the problem mis going to elimite itself by moving to containers. But until then, we're kind of stuck, trying to deal with Windows servers and applications producing file based logs...

Hi!

Is there something stopping you from using Agent and Fleet for this logs use case?

You could use your normal endpoint management to install Agent, then you can create a Policy per type of machine (assuming some machines do run the same applications) and enroll Agents into the matching policies.

The aim of Fleet and Policy management is that the configuration for your endpoints in centrally managed which seems to be your concern.

Absolutely. Fleet - at this point - has incredibly limited functionality when it comes to configuration. It's perfectly fine for metrics, but even simple stuff like custom fields per log path (to map applications to logs), multi-line patterns etc. don't seem to be available for configuration via Fleet. The "custom logs" integration seems to be meant for some of these things, but documentation on how to configure it literally doesn't exist on the public web. Plus: all of this seems to be still in beta and undergoing changes.

The first thing we looked at was actually Fleet. But as it is now, it's not a feasible replacement for an environment with a variety of different log locations and formats running through logstash.

I understand, I've shares this internally with the Fleet team for visibility, the feedback is appreciated!

Hi @mybyte I'm the product manager for Fleet. Thanks for bringing up this discussion, its an area we are tracking for improved functionality in our roadmap.

First, I'll point out that what @leandrojmp described as having a folder for each application should work in Fleet. You can build a large combined agent policy that contains integrations for all of the applications. If they are not present on a host they will be ignored.

In the custom logs integration, you can use the same configuration as the filebeat log input Log input | Filebeat Reference [7.16] | Elastic. Just put the YAML into the custom configuration box. We are working on copying this documentation to the agent here Add Filebeat input documentation to agent by dedemorton · Pull Request #424 · elastic/observability-docs · GitHub. You'd create one custom logs integration per log path, setting the custom fields in each. Does this meet your needs for custom fields per log path?

Logstash output is also one of our highest priorities, which we are tracking here [Fleet] Add a Logstash output type in Fleet settings · Issue #104987 · elastic/kibana · GitHub for delivery in 8.x.

Thanks for the suggestion, I'll check it out!

Then again, keeping the configuration in git certainly has some advantages.