Hi!
I'm opening this topic before opening an issue on github.
I hope not made any mistakes, cause it took me quite a long to test it and to report it in here.
Thanks in advance for your kind help.
Regards
Scenario
- Logstash event generator writing to file "generator.log"
- FilebeatA reading from "generator.log" and writing to "filebeat.log", rotating files each 1MB
- FilebeatB reading from "filebeat.log" writing to "final.log", rotating each 3MB (really, rotation on this file can be disabled, but the problem persist)
Please, notice that:
- If rotation renames file appending .X (eg: file.X) and FilebeatB is listening to file*, most events will be re-procesed (ie: processed twice)
- If rotation creates a new file named with a timestamp (eg: file.YYYY.MM.dd.hh.mm.ss) and FilebeatB is listening to file*, most events will be re-procesed (ie: processed twice)
Result
FilebeatB seems to be losing events registered on FilebeatA due to rotation.
Events registered on filebeat.log.X doesn't exists on final.log*
Seems FilebeatA rotates file before filebeatB can read last file events.
Details
- logstash 5.0.0-alpha2 (PS: docker pull logstash:5.0.0-alpha3-1 -> Tag 5.0.0-alpha3-1 not found in repository docker.io/library/logstash )
- filebeat version 1.2.3 (amd64)
Configuration files
logstash.yml
input {
generator{
add_field => { "from" => "${HOSTNAME}" }
}
}
filter {}
output {
stdout {}
file {
path => "/tmp/generator.log"
flush_interval => 0
}
}
filebeatA.yml
filebeat:
prospectors:
-
paths:
- /tmp/log*
input_type: log
scan_frequency: 0s
output:
file:
path: "/tmp"
filename: filebeat.log
rotate_every_kb: 1000
number_of_files: 100
console:
shipper:
name: filebeatA
tags: ["service-X", "web-tier"]
logging:
to_syslog: false
to_files: false
level: info
filebeatB.yml
filebeat:
prospectors:
-
paths:
- /tmp/filebeat.log
input_type: log
scan_frequency: 0s
output:
file:
path: "/tmp"
filename: final.log
rotate_every_kb: 3000
number_of_files: 10
console:
shipper:
name: filebeatB
tags: ["service-X", "web-tier"]
logging:
to_syslog: false
to_files: false
level: info
Verification
This command checks for gaps on sequence numbers used by logstash generator.
ls -v filebeat.log* | xargs -i grep generator '{}' | sed 's/.*sequence\\\"://g' | sed 's/,.*//g' | sort -n | awk -v "p=1" '(p==1){p=$1-1}($1!=p+1){print p+1"-"$1-1}{p=$1}END{print "ok"}
This command checks for gaps on sequence number received from filebeat.
ls -v final.log* | xargs -i grep generator '{}' | sed 's/.*sequence\\\\\\"://g' | sed 's/,.*//g' | sort -n | awk -v "p=1" '(p==1){p=$1-1}($1!=p+1){print p+1"-"$1-1}{p=$1}END{print "ok"}'
If the above last command returns at least a pair of numbers, you can check those exists on _filebeat_ but doesn't exists on _final_.**
For example, if printed 344356-352529:
grep -H generator filebeat.log* | grep 344356
prints one line.
grep -H generator final.log* | grep 344356
doesn't print anything.
In fact, doing the query for the previous sequence value (344355), shows the filebeat.log.X filename (grep -H) is different (just truncated).
Proposed solution
- Each FilebeatA file is created with creation timestamp (eg: filebeat.log.YY-MM-dd-hh-mm-ss), hence FilebeatB will be able to read it until rotation happens.
- At the moment of rotation, a new file is created with the creation timestamp, but the previous one still exists to be processed by FilebeatB, which will close it after a certain timeout.