Filebeats creating duplicates on SMB logfile sources

Hi,

I am faced with this challenge. The logfiles I want to watch with filebeats and then send to graylog are accessible via a SMB mount only. I already mounted with

cache=none,actimeo=0

What I am seeing is that when the logfile gets a new entry at the end of the file, filebeats does notice however it rereads lots of the file and is sending out events for the past events (to the best of my knowledge the entire file). I suspect this is a problem with the logfiles being on an SMB share (from Filebeats point of view).

This seems consistent with this:

* log.level=
<b>info</b>
@timestamp=
<b>2024-11-19T22:52:22.631+0100</b>
log.logger=
<b>input.harvester</b>
log.origin.file.line=
<b>330</b>
log.origin.file.name=
<b>log/harvester.go</b>
log.origin.function=
<b>github.com/elastic/beats/v7/filebeat/input/log.(*Harvester).Run</b>
message=
<b>File was truncated. Begin reading file from offset 0.</b>
service.name=
<b>filebeat</b>
input_id=
<b>0dc3d135-3a2f-4e5d-a640-397a209b52c8</b>
source_file=
<b>/logs/c4/debug/chowmain_event_logger.log</b>
state_id=
<b>native::786563-3145860</b>
finished=
<b>false</b>
os_id=
<b>786563-3145860</b>
harvester_id=
<b>663d3ac0-04f8-4eaf-a4cf-252ae652ba0a</b>
ecs.version=
**1.6.0**

Any ideas on how to tackle this? I have no control over the SMB server unfortunately. There currently is no log rotation in place (or not happening during the time of these tests). Whenever a new line is posted the entire file is reread and put in events.

Can you share your filebeat config?

That error is printed when Filebeat checks the size of the file and the size that we got back from the Filesystem is smaller than the offset Filebeat recorded the last time it read from the file.

In other words (as an example) Filebeat thinks it's 500KB into the file but the filesystem is telling Filebeat that the file is only 400KB long.

Depending on the cause, this can sometimes be addressed by enabling fingerprinting on the filestream input. filestream input | Filebeat Reference [8.16] | Elastic

fingerprint:
  enabled: true
  offset: 0
  length: 1024

Sure:

filebeat.inputs:
  - type: log
    enabled: true
    encoding: plain
    paths:
      - /logs/c4/debug/chowmain_event_logger.log
    fields_under_root: true
    fields:
      host.name: "ca10"
    ignore_older: 0
    scan_frequency: 10s
    close_inactive: 5m
    close_eof: true

output.console:
  pretty: true

Filebeats is running inside a docker and the mounted SMB is a volume bind in the docker. How does fingerprinting help in this case?

And thanks for the fast response!

The first change you should make is to switch to the filestream input as the log input is deprecated: filestream input | Filebeat Reference [8.16] | Elastic

I'm not entirely convinced fingerprinting will fix it but with network volumes there's no guarantee that the file ID remains stable.

Fingerprinting generates the files ID using the first 1024 bytes of the file. Fingerprinting is almost always the first recommendation when network volumes are in play.

The filestream input also logs the offset and the received file size from the filesystem during truncation which might be useful additional information here

Ignore my "how would it help". Just read the docs on this and I had mistaken it for another fingerprinting (for fields).

Preliminary first test: This seems to have done the trick. It needs 1024 byte of new data/change but that is acceptable. I will monitor this. You are a lifesaver and I hate myself once more for investigating hours instead of asking the knowledgeable people earlier!

Thanks. Much appreciated!

1 Like

Fingers crossed that it solves your problem!

I also followed your advice and switch to file stream:

filebeat.inputs:
  - type: filestream
    enabled: true
    encoding: plain
    paths:
      - /logs/c4/debug/chowmain_event_logger.log
    fields_under_root: true
    fields:
      host.name: "ca10"
    ignore_older: 0
    scan_frequency: 10s
    close_inactive: 5m
    close_eof: true
    prospector.scanner.check_inode: false  # Optional for SMB reliability