Filebeat 6.2.2, Ubuntu 14.04
So we have a couple of servers that are basically never shipping logs. I think the problem is that they at one point had so many logs that their registry is so big that they have trouble ever having a chance to try to clean the registry. Here is the config:
filebeat.prospectors:
- type: log
paths:
- /vol/log/tsar/**.log
close_inactive: 5m
close_removed: true
filebeat.shutdown_timeout: 10s
filebeat.registry_flush: 30s
output.kafka:
hosts: ["kafka-01.stg-ue1.zipaws.com:9092","kafka-02.stg-ue1.zipaws.com:9092","kafka-03.stg-ue1.zipaws.com:9092",]
topic: unified-logs
required_acks: 1
compression: snappy
client_id: 'filebeat.svc'
keep_alive: 10m
partition.round_robin:
reachable_only: true
codec.format:
string: "%{[message]}"
logging.level: debug
To be concrete, here is how many files are in the registry:
$ sudo cat /vol/filebeat-kafka-raw/lib/registry | jq -r '.[].source' | wc -l
365998
And here is how many files actually exist:
$ ls /vol/log/tsar | wc -l
750
Here's a fun snippet from the debug logs:
2018-03-26T09:40:22.642-0700 DEBUG [prospector] file/state.go:82 New state added for /vol/log/tsar/zr-resume-parser-20180204032744.log
2018-03-26T09:40:22.643-0700 DEBUG [registrar] registrar/registrar.go:193 Registrar states cleaned up. Before: 365998, After: 365998
2018-03-26T09:40:22.643-0700 DEBUG [registrar] registrar/registrar.go:200 Processing 1 events
2018-03-26T09:40:22.656-0700 DEBUG [registrar] registrar/registrar.go:193 Registrar states cleaned up. Before: 365998, After: 365998
2018-03-26T09:40:22.656-0700 DEBUG [registrar] registrar/registrar.go:200 Processing 1 events
2018-03-26T09:40:22.657-0700 DEBUG [prospector] file/state.go:82 New state added for /vol/log/tsar/zr-resume-parser-20180204022052.log
2018-03-26T09:40:22.669-0700 DEBUG [registrar] registrar/registrar.go:193 Registrar states cleaned up. Before: 365998, After: 365998
2018-03-26T09:40:22.669-0700 DEBUG [registrar] registrar/registrar.go:200 Processing 1 events
2018-03-26T09:40:22.670-0700 DEBUG [prospector] file/state.go:82 New state added for /vol/log/tsar/zr-resume-parser-20180204033051.log
2018-03-26T09:40:22.682-0700 DEBUG [prospector] file/state.go:82 New state added for /vol/log/tsar/zr-resume-parser-20180204033221.log
2018-03-26T09:40:22.682-0700 DEBUG [registrar] registrar/registrar.go:193 Registrar states cleaned up. Before: 365998, After: 365998
2018-03-26T09:40:22.682-0700 DEBUG [registrar] registrar/registrar.go:200 Processing 1 events
2018-03-26T09:40:22.689-0700 DEBUG [prospector] file/state.go:82 New state added for /vol/log/tsar/zr-resume-parser-20180204022931.log
2018-03-26T09:40:22.703-0700 DEBUG [registrar] registrar/registrar.go:193 Registrar states cleaned up. Before: 365998, After: 365998
2018-03-26T09:40:22.703-0700 DEBUG [registrar] registrar/registrar.go:200 Processing 1 events
2018-03-26T09:40:22.710-0700 DEBUG [prospector] file/state.go:82 New state added for /vol/log/tsar/zr-resume-parser-20180204025349.log
2018-03-26T09:40:22.722-0700 DEBUG [registrar] registrar/registrar.go:193 Registrar states cleaned up. Before: 365998, After: 365998
2018-03-26T09:40:22.722-0700 DEBUG [registrar] registrar/registrar.go:200 Processing 1 events
My guess is that if filebeat were to remove a file from the registry if it's missing, this would eventually make progress, but as it stands it takes 24 minutes just to do this (I didn't time it, I just did the math of the count of files times 4ms.)
Is there something I am missing in my config? I am under the impression that clean_removed should handle this. Is my understanding correct that if I leave this alone long enough it eventually will? (note I have left it alone for days on end.)