FileBeat log rotation problem with Log4j

We are using 6.4.2 version of Elastic Stack.

How we tested?

We created a java class to generate log.

package com.jcg.examples;

import org.apache.log4j.Logger;

public class LoggerMain {

	public static final Logger logger = Logger.getLogger(LoggerMain.class);

	public static void main(String[] args) throws InterruptedException {
		int i = 0;
		while (true) {"This is a test log "+i++);


log4j.rootLogger=INFO, loggerId
log4j.appender.loggerId.layout.ConversionPattern=%d [%t] %-5p (%F:%L) - %m%n


    - type: log
        - C:\\Users\\jegadka\\Downloads\\Log4jExample\\example.log*
      log-type: "test-logroll"

      hosts: ["localhost:5044"]

When we try to execute & filebeat together , we are running out of space.

Since we mentioned example.log* more number of harvesters get opened and it keeps file opened. (Log rotation happens through log4j, filebeat allows it to happen until log4j removes file after log4j.appender.loggerId.MaxBackupIndex=5 reaches).

The problem arises only when 5 log files are generated and when 6th log file gets generated, log4j removes 6th log file. What happens here(our guess), harvester is not getting closed(due to default scan_frequency is higher -10s???!!!) rather it tries to access file at the same location( * C:\Users\jegadka\Downloads\Log4jExample\example.log* ) and file exists.

(May be due to contention between 2 harvesters for a file?? not sure though) We are not sure if this is filebeat bug or this is how filebeat expected to run.

What we understood clearly is, when you set log4j.appender.loggerId.MaxBackupIndex=1000, there will not be any issues with log rotation until 1000 log files, but once it crosses 1000, log rotation gets stopped and one file keep on growing until disk size is full).

We tried almost all possible options like, close_renamed, close_timeout.

Anyone can reproduce this issue. Any solution?

File rotation and concurrently reading logs on windows is tricky. Windows will keep the file contents and meta-data for as long as one process is accessing the file. When the rotator deletes the file, then the file is not deleted, but marked. This means that the file can not be opened anymore (but is still open), but no new file with the same name can be generated -> logging applications might get stalled.
In order to overcome these limitation imposed by the Windows file system I'd recommend to include a timestamp in the rotated file by configuring FileNamePattern in your appender.

On the filebeat side close_remove is not always working correctly, due to the file not being deleted as long as filebeat is trying to read it. I thinkg Filebeat 7.0 includes a workaround in code to improve the reliability of close_remove. It's still a tricky situation. In order to guarantee progress one wants to close files more often (which unfortunately creates extra work due to closing and reopening files over and over again). Setting like close_timeout and close_eof can help here. Using harvester_limit you can also limit the maximum number of open files. Again due to windows woes I'd recommend to have harvester_limit < number of files.

Another reason you run into this is throughput/backpressure/network. Filebeat can't always close files if the outputs are inactive, as some checks like close_eof and close_remove require filebeat to make progress (we need to compare ACKed offsets to file sizes). If the output gets stalled, then filebeat queues fill up and filebeat can not progress here. Spooling can help to move logs file to a spool file (preconfigured size), but once the spool is full filebeat can not progress anymore again. And the close settings might lead to data-loss, as files get rotated/deleted before filebeat has had a chance to send everything. While spooling to disk can help with temporary bursts and downtimes, one still should tune Elasticsearch and Filebeat->Elasticsearch throughput.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.