Log rotation and filebeat

I have multiple long running jobs that produce alot of logs. We are using logrotate utility of Linux to rotate the logs. The problem is that filebeat can miss logs. For example, if I have a log file named output.log and logs are written to it at high frequency. As soon as the log file reaches 200M, we rotate it.

If filebeat is down or is a bit slow then it can miss logs because output.log content has been moved to output.log.1.
If we also scan output.log* files then we have duplicates.

Questions:

  • How to design a solution with filebeat and logrotate that we don't miss a log message?
  • Can filebeat also rotate files? Since filebeat knows how much it has processed, if it can rotate then it would be the best solution.

For starters, I think you should lower the scan_frequency value (default is 10s). Try with 1s, we don't recommend using values smaller than that.

In order to provide further tunning, I'd like to have more information from you:

  • Your current config.
  • How fast are files created.
  • How fast are files rotated (that is, how long until they reach 200M)

How fast are files created.
it also depends on the loads. It is hard for me to give you exact numbers. In a problematic VM, we could have a new file almost every minute but with little data inside (~20MB).

How fast are files rotated (that is, how long until they reach 200M)
Well, it varies. In some cases it could take around 3-4 hours.

Your current config.
Here is our current config


    filebeat:
  prospectors:
    -
      document_type: jobA
      input_type: log
      paths:
        - "/path/to/jobA/logs/*/*.log"
      multiline:
         pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3}'
         negate: true
         match: after
      fields:
          instance_name: ""
          instance_id: ""

    -
      document_type: jobB
      input_type: log
      paths:
        - "/path/to/jobB/logs/*/*/std*"
        - "/path/to/jobB/logs2/*/std*"
      multiline:
         pattern: '(^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3})|(^[0-9]{2}/[0-9]{2}/[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2})'
         negate: true
         match: after
      fields:
          instance_name: ""
          instance_id: ""

  registry_file: /path/to/filebeat/registry

output:
  logstash:
    hosts: [":5044"]
    bulk_max_size: 1024


shipper:

logging:
  to_syslog: false
  to_files: true
  files:
    path: /path/to/filebeat/log
    name: filebeat.log
    rotateeverybytes: 10485760 # = 10MB
    keepfiles: 7

Right now, we cannot guarantee that all logs make it to your output before they are rotated out. We are planning to introduce a better support for reading from rotating logs.

Unfortunately, Filebeat does not support log rotation yet.

Could you please open an enhancement request on GH? https://github.com/elastic/beats/issues/new
I've just started to collect requirements for this feature a few days ago. Any input and use case could help us a lot to design a useful solution. Thank you in advance.

Unfortunate.

Yes, I will create a request.

I'm running into the exact same issue. Since we don't wish to fill up out disk with FB holding the rotated file open we have a close_timeout, but that timer starts at the start of the harvester. It would be helpful to have an option to start a timer on rename. So we can say, after the file has been rotated to log.1 give us 10m to finish reading the file or else give up, log that we gave up, and free the file. Then I can at least see this even occurring and take the correct action. (Preferably a WARN level log)

The recommended way is to include also all rotated files in the pattern.

@bsikander Can you share why you have duplicated events if you include the rotated files? This should not be the case. Filebeat was designed exactly for these kind of use cases.

@djtecha What you describe sounds like a different issue from the above discussed. It's definitively an interesting feature request.

@ruflin
As I mentioned, I am using logrotate linux utility to rotate. I am using the feature "copytruncate" of logratate which creates a new file, copies all the logs from original file to a new file and then it truncates the original file. This way we have completely new log file and if I am not wrong then filebeat will also consider it as new and will process all the logs in rotated file also which will cause duplicates.

Can you share why you use copytruncated? With copytruncate there is a small time frame where you will use log lines (not because of Filebeat) so I would not recommend this mechanism for log rotation if it's important to keep all log lines. logrotate supports quite a few other mechanisms which don't have this problem.

ahan, which one would you recommend to avoid this problem?

All the options that rename the file and add a postfix to the name and create a new one to write the data in. Anything that does a copy or truncates the file I would stay away from.

Thanks. I will give it a try.

I tried to look for other options. The problem that I have is that my process redirects the stdout and stderr to a log file. If I don't use copytruncate then the process keeps on writing to the rotated file. Changing the behavior of my process is not an option right now.

For example:
1- Process writing to abc.log
2- Logrotate happens now, I have 2 files abc.log and abc.log.1
3- process keeps on writing to abc.log.1 and other file has 0 size

copytruncate seems to be the only option.
Similar problem to mine: https://unix.stackexchange.com/questions/147938/rotating-log-files-while-process-still-running

Does your process not accept a SIGHUP to reload the logging process? Copytruncate is really a last resort option. We usually have rsyslog catch a process that would other write to syslog based off of a regex catch and then rsyslog can accept a SIGHUP. Though I know most OSS have this functionality built in.

I have logs from Apache Spark and I doubt that it catches SIGHUP.

I see, the above is probably why copytruncated was invented :frowning:

How did you configure the Apache Spark logging? Did a quick googling and found https://mapr.com/blog/how-log-apache-spark/ It seems spark uses log4j which supports lots of other options?

Thank you for looking it up. I am already aware of logging options in Spark. I am hitting a corner case, where log4j is not viable and that is why I tried alternative like "logrotate" with copytruncate. It seems that I should open a ticket on Spark.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.