Filebeat is losing events due to file rotate

mostolog · June 10, 2016, 11:31am

Hi!

I'm opening this topic before opening an issue on github.
I hope not made any mistakes, cause it took me quite a long to test it and to report it in here.
Thanks in advance for your kind help.

Regards

Scenario

Logstash event generator writing to file "generator.log"
FilebeatA reading from "generator.log" and writing to "filebeat.log", rotating files each 1MB
FilebeatB reading from "filebeat.log" writing to "final.log", rotating each 3MB (really, rotation on this file can be disabled, but the problem persist)

Please, notice that:

If rotation renames file appending .X (eg: file.X) and FilebeatB is listening to file*, most events will be re-procesed (ie: processed twice)
If rotation creates a new file named with a timestamp (eg: file.YYYY.MM.dd.hh.mm.ss) and FilebeatB is listening to file*, most events will be re-procesed (ie: processed twice)

Result

FilebeatB seems to be losing events registered on FilebeatA due to rotation.
Events registered on filebeat.log.X doesn't exists on final.log*

Seems FilebeatA rotates file before filebeatB can read last file events.

Details

logstash 5.0.0-alpha2 (PS: docker pull logstash:5.0.0-alpha3-1 -> Tag 5.0.0-alpha3-1 not found in repository docker.io/library/logstash )
filebeat version 1.2.3 (amd64)

Configuration files

logstash.yml

input {
    generator{
        add_field => { "from" => "${HOSTNAME}" }
    }
}
filter {}
output {
    stdout {}
    file {
        path => "/tmp/generator.log"
        flush_interval => 0
    }
}

filebeatA.yml

filebeat:
  prospectors:
    -
      paths:
        - /tmp/log*
      input_type: log
      scan_frequency: 0s
output:
  file:
    path: "/tmp"
    filename: filebeat.log
    rotate_every_kb: 1000
    number_of_files: 100
  console:
shipper:
  name: filebeatA
  tags: ["service-X", "web-tier"]
logging:
  to_syslog: false
  to_files: false
  level: info

filebeatB.yml

filebeat:
  prospectors:
    -
      paths:
        - /tmp/filebeat.log
      input_type: log
      scan_frequency: 0s
output:
  file:
    path: "/tmp"
    filename: final.log
    rotate_every_kb: 3000
    number_of_files: 10
  console:
shipper:
  name: filebeatB
  tags: ["service-X", "web-tier"]
logging:
  to_syslog: false
  to_files: false
  level: info

Verification

This command checks for gaps on sequence numbers used by logstash generator.
ls -v filebeat.log* | xargs -i grep generator '{}' | sed 's/.*sequence\\\"://g' | sed 's/,.*//g' | sort -n | awk -v "p=1" '(p==1){p=$1-1}($1!=p+1){print p+1"-"$1-1}{p=$1}END{print "ok"}

This command checks for gaps on sequence number received from filebeat.
ls -v final.log* | xargs -i grep generator '{}' | sed 's/.*sequence\\\\\\"://g' | sed 's/,.*//g' | sort -n | awk -v "p=1" '(p==1){p=$1-1}($1!=p+1){print p+1"-"$1-1}{p=$1}END{print "ok"}'

If the above last command returns at least a pair of numbers, you can check those exists on _filebeat_ but doesn't exists on _final_.**

For example, if printed 344356-352529:
grep -H generator filebeat.log* | grep 344356 prints one line.
grep -H generator final.log* | grep 344356 doesn't print anything.

In fact, doing the query for the previous sequence value (344355), shows the filebeat.log.X filename (grep -H) is different (just truncated).

Proposed solution

Each FilebeatA file is created with creation timestamp (eg: filebeat.log.YY-MM-dd-hh-mm-ss), hence FilebeatB will be able to read it until rotation happens.
At the moment of rotation, a new file is created with the creation timestamp, but the previous one still exists to be processed by FilebeatB, which will close it after a certain timeout.

ruflin · June 13, 2016, 7:18am

@mostolog Thanks a lot for the effort to write down all the details. This helps a lot in investigating and reproducing. I will have a closer look later.

One thing I spotted which started me worrying, is that you use scan_frequency: 0. This should be at least 1s, as it can lead to very high CPU usage and lead to unexpected behaviour. Could you retry it with 1s? Also it would be interesting to see if your issue persists in 5.0.0-alpha3.

mostolog · June 14, 2016, 7:37am

I have modified scan_frequency to 1s in both FilebeatA and FilebeatB. As documentation doesn't say anything about "unexpected behaviour", should I open a bug to fix that?

Same behaviour with 5.0.0-alpha3-1: FilebeatB is missing events.

Any other test I could do? Should I fill the issue now?

mostolog · June 15, 2016, 7:53am

Just created Issue #5503

ruflin · June 15, 2016, 7:54am

I'm currently looking into it and will let you know when I find more details.

Note about the docker images: Be aware that these are no official images from Elastic.

system · July 1, 2016, 11:31am

This topic was automatically closed after 21 days. New replies are no longer allowed.

Topic		Replies	Views
Filebeat missing some log lines Beats filebeat	4	2082	July 5, 2017
Filebeat does not pickup rotated logfile from NLog Beats docker , filebeat	2	384	August 13, 2021
Sometimes filebeat stops collecting/flushing events Beats filebeat	3	296	April 1, 2021
Forcing Filebeat to drop events when files deleted Beats filebeat	7	1235	June 30, 2017
Missing logs with rotate log Beats filebeat	15	5108	July 21, 2016