Hi, when I use filebeat to collect the log data. I found that the output file have many dumplications of a log entry. The log data is generated by a python script. Every log entry has a unique number and every 300 entries will trigger a logrotate. First, I start the filebeat, then run the script to generate the log. At last, I found that the 1-300 entries occuer once, 300-600 entries occure twice, 600-900 entries occure three times and so on.
Hi,
Just to be accurate - filebeat should not duplicate entries after log rotation happens? We are starting with ELK, have everything set up and sending the logs to ES and at the moment we are trying to log rotate every week. We are using Centos 7 logrotate with similar conf to OP. The only worrying part is if we rotate log and then tell filebeat to track both logs for 5 minutes to not loose any entries, will filebeat treat the rotate file production.log1 as a new log and will resend all the data to ES?
filebeat uses inodes to keep track if file identity. It will detect production.log1 being a renamed file (if inode didn't change, check with ls command) and will continue processing where it left of in case anything was missing. The newly created production.log should be processed from the very beginning.
I think it will not resend the production.log.1 as a new log if you don't add it to the configuration of the filebeat. Because you can see the same source of the dumplication in the beat.log.
As Steffen wrote, filebeat does track files by inode and not name. So if a file is rotated, data will not be resent as it still the same file.
@haha I'm still trying to figure out what goes wrong in your case especially that you mention you also see this under the 5.0 alpha snapshot. Any chance to post the debug log of the 5.0 run? How do you exactly start filebeat?
I'm sorry. I think my python script cause this problem. I use python logging module to write the log. At first, the filebeat get the right output with the logrotate. But the script write all log in the production.log.1. That, I think is a wrong way to use logging and logrotate. So , I change my script to close filehandler before logrotate and open a new filehandler after logrotate and get the above problem. Today, I try to use copytruncate in the logrotate and don't refresh the filehandler when logrotate. Then, I get the correct filebeat output.
So, I think my way to use logrotate and logging may not be suitable. Thank you for you help.
We use Python for some of our system tests and use the internal Python log rotation feature: https://github.com/elastic/beats/blob/master/filebeat/tests/system/test_load.py#L40 This should automatically do the right thing. I didn't check your script in detail but you can check ours to see if there are some difference. If not needed, I would stick with rotation instead of copytruncate as this can lead to data loss (as mentioned in the logrotate docs).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.