This is a new feature which I'd like to contribute, but wanted to get some feedback. The idea is to limit the amount of events published, essentially scaling back the output, and thereby the pressure put on downstream collectors. The idea is to sample a subset of the logs based on zero criteria. Which is to say it wouldn't strictly be a condition based filtering.
It would be a useful feature to help build out downstream processing during a scaling event. If this feature could be turned on dynamically while collectors are overloaded, and then turned off once collectors have been scaled up for increased load. Also useful to strictly reduce the required throughput and storage when the logs tied to a specific filebeat instance are not critical, but are reading a high-throughput source.
Here are a couple of ways that it could be done, I believe:
-
Write a processor that looks for hints (like kubernetes annotation) then based on the value of the annotation use that as a threshold to judge if the event should be published. Something as simple as generate a random float and if it's less than the value provided in the annotation then ship the event, else drop the event. In addition log some metrics on number skipped, published, the recent threshold used (perhaps), the percentage sampled, etc. The metrics could be published with the other metrics that filebeat tracks and logs by default every 30 seconds.
-
Another idea would be to write a processor that simply injects an event with a random floating point number. Then additionally configure conditions that filter out those events again based on some kind of threshold. An AND condition could be used to further refine the sampling based on annotations. This is flexible, but lacks the visibility to determine as in #1 the number of logs skipped, published and sampling rate. Perhaps in addition to this one there might be someway to add metrics. Would need some guidance on this.
Interested in hearing any feedback on the idea.
Thanks,
L-