Problem with updating Filebeat 8.11.4 to 8.13.4 - unwanted retransmision of all files

Hello.
I have a problem with updating Filebeat 8.11.4 to 8.13.4 - when I do the upgrade by downloading zip package for Windows, I do just like it is described in documentation:

  1. stop filebeat service (old version)
  2. download .zip package for windows and unpack it to destination folder
  3. run installation script .ps1 (my is modified so data folder with registry is located in filebeat main folder not in C:\ProgramData\filebeat - like it is in default installation script) My new filebeat service is filebeat_8_13_4 name convention.
  4. I copy data folder (with registry in it and other files like meta.json etc.) and filebeat.yml form old filebeat 8.11.4 to new version 8.13.4)
  5. Starting

The crazy part is: When I have done test on my dev environment (update from 8.11.4 to 8.13.4) I have followed the procedure - it works correctly - new filebeat after copied registry and filebeat.yml config file is no retransmitting all documents from the scratch It works fine.
But when I do the same operation on PROD env - filebeat ignores copied registry and begins to start shiping all files again and creating duplication of the data :frowning:
For the record - dev and prod are the same cluster version 8.13.4

Does anybody have an idea what is wrong ? It driving me nuts ...
Kind regards

This really makes me believe that there is something on your prod environment that's making Filebeat use a different folder for the registry.

When Filebeat starts up it will log the folders it is using:

{
  "log.level": "info",
  "@timestamp": "2024-10-10T08:27:02.660-0400",
  "log.origin": {
    "function": "github.com/elastic/beats/v7/libbeat/cmd/instance.(*Beat).configure",
    "file.name": "instance/beat.go",
    "file.line": 1058
  },
  "message": "Home path: [/tmp/go-build319583623/b001/exe] Config path: [/tmp/go-build319583623/b001/exe] Data path: [/tmp/go-build319583623/b001/exe/data] Logs path: [/tmp/go-build319583623/b001/exe/logs]",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}

Check if the data path is correct and the files are there.

Another, more remote possibility is that something has changed with the file identity. Has anything changed in the host other than the Filebeat version?

By default Filebeat will use inode + deviced ID to identify files uniquely, depending on your OS/file system those can change in certain situations.

Here is an registry entry (${path.data}/registry/filebeat/log.json) from the log input:

{
  "k": "filebeat::logs::native::23201-39",
  "v": {
    "id": "native::23201-39",
    "prev_id": "",
    "offset": 0,
    "type": "log",
    "identifier_name": "native",
    "source": "/tmp/flog.log",
    "timestamp": [
      280445065136719,
      1728515609
    ],
    "ttl": -1,
    "FileStateOS": {
      "inode": 23201,
      "device": 39
    }
  }
}

If you look at "k": "filebeat::logs::native::23201-39", 23201 is the inode and 39 is the device ID. You can check if they have changed for your files.

Hey. Thanks for quick reply.
To check paths I have started filebeat from console with debug option enabled - the paths are correct and registry files are ok. Even in log on filebeat there is info that Registry file was loaded successfully.

Only thing is that the Cluster version was updated from 8.11.4 to 8.13.4.
I will follow and check if there is a mismatch between inode and device id. Will report back.
Thanks again.

Hey.
I have checked files from old version and new version in old i have
{"k":"filestream::.global::native::720893-118-1682839212",
It's different than You showed. But if 720893-118 is INODE and 118 device ID than in old log.json is almost the same but in new filebeat log.json there are many different number sets.
Correct me if I'm wrong.

Hey @elk1985,

If I understood it correctly, after the upgrade you noticed that the file identity (inode + device ID) changed for the same file, is that it?

If that's the case, yes, Filebeat will re-ingest the files. It happens that some OSes/filesystems that don't fully implement inodes will provide some sort of inode that ends up not being stable. The best option is to use the fingerprint file identity that does not rely on the filesystem, here is an example:

    - type: filestream
      id: my-unique-id
      paths:
        - /var/log/*.log
        scanner:
          fingerprint.enabled: true
      file_identity.fingerprint: ~

Another thing I noticed is that you didn't set a unique ID for your filestream input, if you run more than one filestream input without an ID, there will be data duplication/files being re-ingested. So make sure all of your filestream inputs have an unique ID.

One thing to bear in mind: because the ID is used to generate the key in used in the registry to keep track of the file's offset, changing the input ID will cause re-ingestion. However, re-ingesting files once when fixing the config is better than having data re-ingested every time you restart Filebeat.