Even with `clean_removed: true` in the filestream configuration, registry entrie…s are never removed from the in-memory store.
## How to reproduce
1. Create the following `filebeat.yml`
```yaml
filebeat.inputs:
- type: filestream
id: test-clean-removed
enabled: true
clean_removed: true
close.on_state_change.inactive: 1s
prospector.scanner.check_interval: 1s
paths:
- /tmp/log.log
filebeat.registry:
file_permissions: 0600
cleanup_interval: 5s
flush: 0s
queue.mem:
flush.min_events: 8
flush.timeout: 0.1s
output.file:
path: ${path.home}
filename: "output-file"
rotate_every_kb: 10000
logging:
level: debug
selectors: ["*"]
```
2. Start Filebeat
```
./filebeat &
```
3. Create the log file
```
echo "fist file">> /tmp/log.log
```
4. Wait a few seconds, like 10s, for the file to be harvested (you can check the output file and/or logs)
5. Delete the file
```
rm /tmp/log.log
```
6. Wait a few more seconds, like 10s
7. Look the logs for the store clean up messages
```
{"log.level":"debug","@timestamp":"2023-10-05T12:18:46.854+0200","log.logger":"input","log.origin":{"file.name":"input-logfile/clean.go","file.line":66},"message":"Start store cleanup","service.name":"filebeat","input_type":"filestream","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-10-05T12:18:46.854+0200","log.logger":"input","log.origin":{"file.name":"input-logfile/clean.go","file.line":79},"message":"No entries to remove were found","service.name":"filebeat","input_type":"filestream","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-10-05T12:18:46.854+0200","log.logger":"input","log.origin":{"file.name":"input-logfile/clean.go","file.line":80},"message":"Done store cleanup","service.name":"filebeat","input_type":"filestream","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-10-05T12:18:51.854+0200","log.logger":"input","log.origin":{"file.name":"input-logfile/clean.go","file.line":66},"message":"Start store cleanup","service.name":"filebeat","input_type":"filestream","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-10-05T12:18:51.854+0200","log.logger":"input","log.origin":{"file.name":"input-logfile/clean.go","file.line":79},"message":"No entries to remove were found","service.name":"filebeat","input_type":"filestream","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-10-05T12:18:51.854+0200","log.logger":"input","log.origin":{"file.name":"input-logfile/clean.go","file.line":80},"message":"Done store cleanup","service.name":"filebeat","input_type":"filestream","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-10-05T12:18:56.854+0200","log.logger":"input","log.origin":{"file.name":"input-logfile/clean.go","file.line":66},"message":"Start store cleanup","service.name":"filebeat","input_type":"filestream","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-10-05T12:18:56.854+0200","log.logger":"input","log.origin":{"file.name":"input-logfile/clean.go","file.line":79},"message":"No entries to remove were found","service.name":"filebeat","input_type":"filestream","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-10-05T12:18:56.854+0200","log.logger":"input","log.origin":{"file.name":"input-logfile/clean.go","file.line":80},"message":"Done store cleanup","service.name":"filebeat","input_type":"filestream","ecs.version":"1.6.0"}
```
You will see that entries were never removed.
## Possible cause
I did a bit of investigation for the possible cause and I found that the resource containing the state of the file is never fully released, even after it's deletion.
Applying the following patch and the running the steps described above will show that the resource is never fully released.
```
diff --git a/filebeat/input/filestream/internal/input-logfile/store.go b/filebeat/input/filestream/internal/input-logfile/store.go
index 2b5b272..835654d 100644
--- a/filebeat/input/filestream/internal/input-logfile/store.go
+++ b/filebeat/input/filestream/internal/input-logfile/store.go
@@ -459,13 +459,20 @@ func (r *resource) isDeleted() bool {
// Retain is used to indicate that 'resource' gets an additional 'owner'.
// Owners of an resource can be active inputs or pending update operations
// not yet written to disk.
-func (r *resource) Retain() { r.pending.Inc() }
+func (r *resource) Retain() {
+ fmt.Println("========== RETAIN")
+ r.pending.Inc()
+}
// Release reduced the owner ship counter of the resource.
-func (r *resource) Release() { r.pending.Dec() }
+func (r *resource) Release() {
+ fmt.Println("========== RELEASE")
+ r.pending.Dec()
+}
// UpdatesReleaseN is used to release ownership of N pending update operations.
func (r *resource) UpdatesReleaseN(n uint) {
+ fmt.Println("========== RELEASE", n)
r.pending.Sub(uint64(n))
}
```
On my tests I saw
- 3 retain calls
- 2 release calls
- the file was deleted
- 1 retain call
This makes calls to [`Finished()`](https://github.com/elastic/beats/blob/2988148e3ea5b34b076f6eb72dde8a9f16060152/filebeat/input/filestream/internal/input-logfile/store.go#L474) never to return `true`, so those states are never cleaned.
## Related issues
This issue is likely the same described in https://github.com/elastic/beats/issues/36056, but because of the focus on `clean_removed` and the in-memory store rather than the registry files, I decided to open a different issue.