Take_over option not working - logs being reharvested after filebeat restart

I followed the migration guide to migrate my log inputs to filestream inputs, including adding a unique id and setting the "take_over" option to true. However, upon restarting the filebeat service, all of the logs are reharvested, resulting in a huge spike of millions of messages that brings my Graylog server to its knees.

I'd like to switch to filestreams since log inputs are deprecated and slated for removal, but if I can't migrate smoothly, I will not be able to roll it out to my production servers where the spike in reharvested logs would be 100x bigger.

Hi @ian.springer-sf Welcome to the community.

What version of the Stack and Beats?

Can you share your filebeat config?

Question are you running this in the same instance that already loaded the data from the logs input.... just checking

This method relies on reusing the registry data that tracks progress... if you just run this in a new filebeat instance it will try to load all the logs / files

Hi Stephen,

Thanks!

I'm running filebeat 7.17.0. I was running 7.14.0 prior to the restart where I switched from log inputs to filestream inputs, in case that makes a difference.

Here's my filebeat.yml:

filebeat.modules: []
filebeat.inputs: []
logging.level: info
logging.to_files: true
logging.files:
  name: filebeat
  path: "/var/log/filebeat"
output.logstash:
  hosts:
  - t4log:5044
  ssl.enabled: true
  ssl.certificate: "/root/.client-certs/client-graylog.crt"
  ssl.key: "/root/.client-certs/client-graylog-key.pem"
  ssl.certificate_authorities:
  - "/root/.server-certs/server-graylog-ca-chain.crt"
  ssl.verification_mode: certificate
processors:
- copy_fields:
    fields:
    - from: agent.name
      to: agent.hostname
    fail_on_error: false
    ignore_missing: true
filebeat.registry.path: "/var/lib/filebeat/registry"
filebeat.config.inputs:
  enabled: true
  path: "/etc/filebeat/conf.d/*.yml"

And here's an example of one of my input configs (/etc/filebeat/conf.d/lwrp-prospector-integrator.yml) before:

- paths:
  - "/data/application-logs/integrator/integrator.log"
  exclude_lines: []
  tags:
  - integrator_main
  exclude_files: []
  tail_files: false
  ignore_older: 168h
  multiline.type: pattern
  multiline.pattern: "^([A-Z ]{5} \\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}|\\[Stage )"
  multiline.negate: true
  multiline.match: after

and after:

- type: filestream
  id: integrator
  take_over: true
  paths:
  - "/data/application-logs/integrator/integrator.log"
  exclude_lines: []
  tags:
  - integrator_main
  ignore_older: 168h
  prospector.scanner.exclude_files: []
  close.on_state_change.inactive: 96h
  close.on_state_change.removed: true
  parsers:
  - multiline:
      type: pattern
      pattern: "^([A-Z ]{5} \\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}|\\[Stage )"
      negate: true
      match: after

Yes, I'm running it in the same instance that was previously using log inputs. I updated the input configs, updated the filebeat binaries from 7.14.0 to 7.17.0, and restarted the filebeat service. I then saw the flood of duplicate log messages in Graylog.

Ahhhh ... think I got it...

take_over is an brand new 8.7 Feature... (not even in 8.6)

You are using 7.17 and so it is not as clean and I think you are looking at the 8.7 docs... but you need to be looking at the 7.17 docs

From here

Step 2: Exclude all processed files

Filebeat does not provide access to the state information of different inputs. Hence, the filestream input cannot access the state information of a log input in the Filebeat registry. You must exclude the files the log input has processed or is processing. If you do not exclude those files, you will end up with duplicate events in the output.

Do you truly only have a single log file?

Ah, damn - nice catch!

I noticed in the upgrade docs that it's recommended to upgrade to 7.17 first before upgrading to 8.x. Considering that, it sounds like I will have to do the following:

  1. switch back to log inputs
  2. upgrade to 7.17.0
  3. upgrade to 8.7.0
  4. migrate to filestream inputs using take_over option

Does that look correct to you?

We do rotate some of our log files, but not all. And even for the ones we rotate, we let them grow pretty large before rotating, so we would want a smooth transition that avoids sending lots of duplicate messages.

Looks pretty good....

Maybe pre / short post.... smaller files for rotation... then once everything is running well go back to big files. ...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.