Filestream Fingerprint Mode

Hi

I had a question regarding this new fingerprint mode with filestream. Introducing Filestream fingerprint mode | Elastic Blog

Does the file id created with fingerprint uses the filename as well as the content of the file? I am asking because what happens if you have two log files with the same content and two different filename? Or even same name but different path location? In these scenarios would the file id be the same?

Thank you,
Kris

It appears if two files have the same content but different filename fingerprint views them as the same. This is not ideal.

2024-02-13T20:56:24.39360761Z stderr F {"log.level":"warn","@timestamp":"2024-02-13T20:56:24.393Z","log.logger":"scanner","log.origin":{"file.name":"filestream/fswatch.go","file.line":394},"message":"\"/var/log/kristine2.log\" points to an already known ingest target \"/var/log/kristine1.log\" [0e42b7661026c2eb37c8597817e38366eef19794fb5bb60e143823c800658fb9==0e42b7661026c2eb37c8597817e38366eef19794fb5bb60e143823c800658fb9]. Skipping","service.name":"filebeat","ecs.version":"1.6.0"}

This is how the fingerprint works, it will use the content of the file to create a fingerprint, per default it will use the first 1024 bytes to create a fingerprint, the filename doesn't matter.

Yes but its not ideal, a most robust unique id would be to incorporate the filename as well as the content

mon-filebeat-r9fsc_monitor_filebeat-11c2cc496c0fe4da30ea8c5fde2cb75f884f7fe4184a95f53c201daf5f598951.log:2024-02-14T15:34:54.395762533Z stderr F {"log.level":"warn","@timestamp":"2024-02-14T15:34:54.395Z","log.logger":"scanner","log.origin":{"file.name":"filestream/fswatch.go","file.line":394},"message":"\"/var/log/keystone/keystone-all.log\" points to an already known ingest target \"/var/log/fm-api.log\" [5083cc1d5cdadaffbb4fa891b7969805cce5d57dcdab7ceb53bae8a54e0e8323==5083cc1d5cdadaffbb4fa891b7969805cce5d57dcdab7ceb53bae8a54e0e8323]. Skipping","service.name":"filebeat","ecs.version":"1.6.0"}
mon-filebeat-r9fsc_monitor_filebeat-11c2cc496c0fe4da30ea8c5fde2cb75f884f7fe4184a95f53c201daf5f598951.log:2024-02-14T15:34:54.398734982Z stderr F {"log.level":"warn","@timestamp":"2024-02-14T15:34:54.398Z","log.logger":"scanner","log.origin":{"file.name":"filestream/fswatch.go","file.line":394},"message":"\"/var/log/puppet/latest/puppet.log\" points to an already known ingest target \"/var/log/puppet/2024-02-13-15-44-07_aio/puppet.log\" [567235a799bc177d494ae0c8b153b9b5fdbf011ca9b1aaf50041db8b499d0015==567235a799bc177d494ae0c8b153b9b5fdbf011ca9b1aaf50041db8b499d0015]. Skipping","service.name":"filebeat","ecs.version":"1.6.0"}

Unfortunately this is not possible, and this also would lead to some issues.

For example, how you deal when the file is rotated? You are using fingerprint with the filename, but if the file is rotated, the filename will change, and it will be interpreted as a new file.

Each method has pros and cons, you may use one of the other methods to track the file if the fingerprint does not work for your use case or open a feature request to see if this is implemented.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.