Filebeat too quick on recovering data

I am running ELK 6.6 on a CentOS 7 box. I have filebeat configured on a Windows machine to forward specific logs to Logstash.
The problem I have is that my Logstash filter has a throttling mechanism setup as to not overwhelm the ELK box when live data is being shipped on a production system. However, if (and this happened) filebeat was down, then when restarting it is expected to ship the old files that it hadn't. Filebeat does do this but the problem is that it processes the logfile very quickly ( ~ 3280 KB) and is all shipped within the throttling period which causes it to drop the majority of the logs after the 1002nd. I would like to be able to control the rate at which the data is being sent and/or have a custom throttling rate for older logs (perhaps). Can either one of these 2 be done in a way that it doesn't affect my semi-live log shipping and processing filters?

My filebeat.yml file:

          - type: log
          enabled: true
            - c:\path\to\log\files

      # Glob pattern for configuration loading
      path: ${path.config}/modules.d/*.yml

      # Set to true to enable config reloading
      reload.enabled: false

  index.number_of_shards: 3

    name: Node
    fields_under_root: true
      env: dev
      role: Node
      node: Node

      # The Logstash hosts
      hosts: [""]
      index: myindex

My logstash filter (the relevant part):

    filter {
        throttle {
          period => 30
          max_age => 60
          after_count => 1000
          key => "%{host}"
          add_tag => "throttled"
        if "throttled" in [tags] {
          throttle {
            period => 60
            max_age => 120
            after_count => 2
            key => "%{host}"
            add_tag => "drop"
        if "drop" in [tags] {
          drop { }

Any help would be greatly appreciated.
Thank you


Unfortunately it does seem that Filebeat had such a mechanism so far. There is interesting related discussion at Throttling log output from Filebeat directly?

However configuring some network limits may be helpful in your case:


1 Like

Hi Chris!
Thanks for your reply.
Unfortunately, our infrastructure is kinda complicated, and these logs are somewhat critical when monitoring our systems, so placing limits may cause some performance issues on live logs which will be a problem.
I may look into playing with the Logstash filter to throttle differently depending on the log file name (if that's possible to compare the date in the logfile name to the current date but that's not a discussion for here I assume).
Anyways thanks for the suggestion!

Added this Logstash issue to track this: 3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.