There are multiple threads on this, but they're locked now. I think the existing solutions are all inadequate for a server environment with shared hardware/network infrastructures, where "playing nice" is important.
Describe the enhancement:
Add rate limiting to the configuration options available to the various Beats.
The current docs suggest to use traffic control (
tc) to add rate limiting. This is really suboptimal, for a few reasons:
tcis incredibly difficult to use. I learned it enough to write the commands needed to do the throttling I wanted. Looking back, only a few months later, I have absolutely no clue what they mean. It's an inevitable maintenance problem.
It mutates global state and breaks encapsulation of this application. What if another application also runs a script to configure
tcfor its own traffic shaping? It's a collision waiting to happen.
The configuration for the rate limiting is encoded into whatever script sets up
tc. In effect, this is now a "second" config file, meaning that you no longer have a centralized place for all your Filebeat configuration needs.
It's not cross-platform.
Go offers a bunch of great rate limiting options. There's a better way.
Describe a specific use case for the enhancement or feature:
- A common issue is that Filebeat "builds a backlog" (by remembering "where it left off") when Elasticsearch becomes temporarily unavailable. When it comes back up, Filebeat hammers it. If you have lots of hosts, they're all competing, greedily trying to run as fast as possible, causing all kinds of load and brown out issues.
- More specifically, my personal issue is that my Filebeat agents and Elasticsearch clusters have negotiated a speed that works for them, but is too fast for our network infrastructure, which I don't have the power to change.
- Filebeat/Elasticsearch also has this strange tendency to sometimes spike traffic higher than normal during times when the cluster is overloaded. Seems counter-intuitive, but I can't find the steps to replicate.