Does the file id created with fingerprint uses the filename as well as the content of the file? I am asking because what happens if you have two log files with the same content and two different filename? Or even same name but different path location? In these scenarios would the file id be the same?
It appears if two files have the same content but different filename fingerprint views them as the same. This is not ideal.
2024-02-13T20:56:24.39360761Z stderr F {"log.level":"warn","@timestamp":"2024-02-13T20:56:24.393Z","log.logger":"scanner","log.origin":{"file.name":"filestream/fswatch.go","file.line":394},"message":"\"/var/log/kristine2.log\" points to an already known ingest target \"/var/log/kristine1.log\" [0e42b7661026c2eb37c8597817e38366eef19794fb5bb60e143823c800658fb9==0e42b7661026c2eb37c8597817e38366eef19794fb5bb60e143823c800658fb9]. Skipping","service.name":"filebeat","ecs.version":"1.6.0"}
This is how the fingerprint works, it will use the content of the file to create a fingerprint, per default it will use the first 1024 bytes to create a fingerprint, the filename doesn't matter.
Yes but its not ideal, a most robust unique id would be to incorporate the filename as well as the content
mon-filebeat-r9fsc_monitor_filebeat-11c2cc496c0fe4da30ea8c5fde2cb75f884f7fe4184a95f53c201daf5f598951.log:2024-02-14T15:34:54.395762533Z stderr F {"log.level":"warn","@timestamp":"2024-02-14T15:34:54.395Z","log.logger":"scanner","log.origin":{"file.name":"filestream/fswatch.go","file.line":394},"message":"\"/var/log/keystone/keystone-all.log\" points to an already known ingest target \"/var/log/fm-api.log\" [5083cc1d5cdadaffbb4fa891b7969805cce5d57dcdab7ceb53bae8a54e0e8323==5083cc1d5cdadaffbb4fa891b7969805cce5d57dcdab7ceb53bae8a54e0e8323]. Skipping","service.name":"filebeat","ecs.version":"1.6.0"}
mon-filebeat-r9fsc_monitor_filebeat-11c2cc496c0fe4da30ea8c5fde2cb75f884f7fe4184a95f53c201daf5f598951.log:2024-02-14T15:34:54.398734982Z stderr F {"log.level":"warn","@timestamp":"2024-02-14T15:34:54.398Z","log.logger":"scanner","log.origin":{"file.name":"filestream/fswatch.go","file.line":394},"message":"\"/var/log/puppet/latest/puppet.log\" points to an already known ingest target \"/var/log/puppet/2024-02-13-15-44-07_aio/puppet.log\" [567235a799bc177d494ae0c8b153b9b5fdbf011ca9b1aaf50041db8b499d0015==567235a799bc177d494ae0c8b153b9b5fdbf011ca9b1aaf50041db8b499d0015]. Skipping","service.name":"filebeat","ecs.version":"1.6.0"}
Unfortunately this is not possible, and this also would lead to some issues.
For example, how you deal when the file is rotated? You are using fingerprint with the filename, but if the file is rotated, the filename will change, and it will be interpreted as a new file.
Each method has pros and cons, you may use one of the other methods to track the file if the fingerprint does not work for your use case or open a feature request to see if this is implemented.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.