Filebeat 5.0.0-alpha3 multiple harvesters for same file sending the same messages over and over


(Dave White) #1

Hi,

I'm seeing something strange with filebeat 5.0.0-alpha3. I have prospectors configured on my mesos slave task stdout/stderr files and occasionally I see filebeat get stuck in a state where it's repeatedly sending the same log messages from the same file over and over again. In the latest case of this, I see the following in the filebeat logs around the time when the issue started:

/var/log/filebeat/filebeat.log:2016-08-01T16:39:22Z INFO Harvester started for file: /tmp/mesos/slave/slaves/d1dc259c-bd93-4ed6-8906-5124cab07048-S228/frameworks/Singularity/executors/registrator-7_dev_3-1468270708127-7-platform_apps_mesos_slave_3.node.us_east_1_z.consul-us_east_1c/runs/latest/stderr
/var/log/filebeat/filebeat.log.1:2016-08-01T16:20:14Z INFO Harvester started for file: /tmp/mesos/slave/slaves/d1dc259c-bd93-4ed6-8906-5124cab07048-S228/frameworks/Singularity/executors/registrator-7_dev_3-1468270708127-7-platform_apps_mesos_slave_3.node.us_east_1_z.consul-us_east_1c/runs/latest/stderr
/var/log/filebeat/filebeat.log.3:2016-08-01T16:20:11Z INFO Harvester started for file: /tmp/mesos/slave/slaves/d1dc259c-bd93-4ed6-8906-5124cab07048-S228/frameworks/Singularity/executors/registrator-7_dev_3-1468270708127-7-platform_apps_mesos_slave_3.node.us_east_1_z.consul-us_east_1c/runs/latest/stderr
/var/log/filebeat/filebeat.log.6:2016-08-01T16:20:06Z INFO Harvester started for file: /tmp/mesos/slave/slaves/d1dc259c-bd93-4ed6-8906-5124cab07048-S228/frameworks/Singularity/executors/registrator-7_dev_3-1468270708127-7-platform_apps_mesos_slave_3.node.us_east_1_z.consul-us_east_1c/runs/latest/stderr

The file in question is not very big:

> cat /tmp/mesos/slave/slaves/d1dc259c-bd93-4ed6-8906-5124cab07048-S228/frameworks/Singularity/executors/registrator-7_dev_3-1468270708127-7-platform_apps_mesos_slave_3.node.us_east_1_z.consul-us_east_1c/runs/latest/stderr | wc -l
62568

My prospector config for the mesos stderr taks logs looke like this:

prospectors:
  - paths:
    - "/tmp/mesos/slave/slaves/*/frameworks/*/executors/*/runs/latest/stderr*"
    exclude_files:
    - "^.+logstash.+$"
    - "^.+logrotate.conf$"
    ignore_older: 10m
    multiline:
      pattern: "^[[:space:]]"
      match: after

From the log output above, it looks like several harvesters were started within a handful of seconds of each other for the same file. Soon after this, I'm seeing the same log messages from this file forwarded over and over which ends up flooding my pipeline (I'm forwarding to kafka and indexing with logstash consuming from kafka). Is it possible that the multiple separate harvesters are each updating offsets which cause them to repeatedly reset and resend messages?

Note that I'm using 5.0.0-alpha-3. It looks like there was a lot of work in this area and I'm hopeful the changes in https://github.com/elastic/beats/issues/1600 might help. It's also unclear if all the new close_/clean_ options talked about in the 5.0.0-alpha4 documentation and the above issue are actually available in the 5.0.0-alpha4 release or are still yet to be released and are only in daily builds? Any hints on when the next 5.0.0 release will be coming?


(ruflin) #2

As you rightly pointed out, lots of work has gone into cleaning up / refactoring filebeat. An 5.0.0-alpha5 release should come up soonish with all these changes. Not all fixes are part of alpha4. It would be great if you could test the nightly build (which contains all this changes) to check if you still hit the same issue? https://beats-nightlies.s3.amazonaws.com/index.html?prefix=filebeat/

If multiple harvesters are started on one file, things get ugly. There was indeed a bug that this could happen in alpha3 in some edge cases for rotated files.


(system) #3

This topic was automatically closed after 21 days. New replies are no longer allowed.