Filebeat does not release log files issue


(Can Zhang) #1

My Filebeat version is 1.3.1(amd64)

filebeat.yml config like this:
-
paths:
- /opt/mateinfo/logs/app/app.log
input_type: log
fields:
log_format: wfm_log4j
fields_under_root: true
ignore_older: 1m
close_older: 30m
document_type: app_log

using command lsof ,I find that:

lsof |grep filebeat

filebeat 32652 root 14r REG 202,32 52433403 262268 /opt/mateinfo/logs/app/app.log.15 (deleted)
filebeat 32652 root 15r REG 202,32 52507192 262151 /opt/mateinfo/logs/app/app.log.15 (deleted)
filebeat 32652 root 16r REG 202,32 52456387 262170 /opt/mateinfo/logs/app/app.log.15 (deleted)
filebeat 32652 root 17r REG 202,32 52447751 262161 /opt/mateinfo/logs/app/app.log.15 (deleted)
filebeat 32652 root 18r REG 202,32 52441595 262157 /opt/mateinfo/logs/app/app.log.15 (deleted)
filebeat 32652 root 19r REG 202,32 52438227 262158 /opt/mateinfo/logs/app/app.log.15 (deleted)
filebeat 32652 root 20r REG 202,32 52537002 262162 /opt/mateinfo/logs/app/app.log.15 (deleted)
filebeat 32652 root 21r REG 202,32 52458855 262165 /opt/mateinfo/logs/app/app.log.15 (deleted)
filebeat 32652 root 22r REG 202,32 52528456 262173 /opt/mateinfo/logs/app/app.log.15 (deleted)
filebeat 32652 root 23r REG 202,32 52476072 262167 /opt/mateinfo/logs/app/app.log.15 (deleted)
filebeat 32652 root 24r REG 202,32 52446487 262171 /opt/mateinfo/logs/app/app.log.15 (deleted)
filebeat 32652 root 25r REG 202,32 52509690 262176 /opt/mateinfo/logs/app/app.log.15
filebeat 32652 root 26r REG 202,32 52469426 262169 /opt/mateinfo/logs/app/app.log.15 (deleted)
filebeat 32652 root 27r REG 202,32 52488108 262172 /opt/mateinfo/logs/app/app.log.15 (deleted)
filebeat 32652 root 28r REG 202,32 52451124 262174 /opt/mateinfo/logs/app/app.log.15 (deleted)
filebeat 32652 root 29r REG 202,32 52444883 262156 /opt/mateinfo/logs/app/app.log.15 (deleted)
filebeat 32652 root 30r REG 202,32 52613385 262160 /opt/mateinfo/logs/app/app.log.14
filebeat 32652 root 31r REG 202,32 52494350 262177 /opt/mateinfo/logs/app/app.log.13
filebeat 32652 root 32r REG 202,32 3065192 262178 /opt/mateinfo/logs/app/servicecreator.log
filebeat 32652 root 33r REG 202,32 52571672 262168 /opt/mateinfo/logs/app/app.log.7
filebeat 32652 root 34r REG 202,32 52464099 262164 /opt/mateinfo/logs/app/app.log.11
filebeat 32652 root 35r REG 202,32 52556336 262180 /opt/mateinfo/logs/app/app.log.10
filebeat 32652 root 36r REG 202,32 52513605 262181 /opt/mateinfo/logs/app/app.log.8
filebeat 32652 root 37r REG 202,32 52503382 262185 /opt/mateinfo/logs/app/app.log.4
filebeat 32652 root 38r REG 202,32 52480734 262182 /opt/mateinfo/logs/app/app.log.6
filebeat 32652 root 39r REG 202,32 52467621 262184 /opt/mateinfo/logs/app/app.log.5
filebeat 32652 root 40r REG 202,32 52482734 262186 /opt/mateinfo/logs/app/app.log.3
filebeat 32652 root 41r REG 202,32 31626833 262189 /opt/mateinfo/logs/app/app.log
filebeat 32652 root 42r REG 202,32 52453237 262188 /opt/mateinfo/logs/app/app.log.1

ll /opt/mateinfo/logs/app/app.log*
-rw-r--r-- 1 mateinfo inoc 39628681 May 16 15:34 /opt/mateinfo/logs/app/app.log
-rw-r--r-- 1 mateinfo inoc 52453237 May 16 15:20 /opt/mateinfo/logs/app/app.log.1
-rw-r--r-- 1 mateinfo inoc 52556336 May 16 14:59 /opt/mateinfo/logs/app/app.log.10
-rw-r--r-- 1 mateinfo inoc 52464099 May 16 14:54 /opt/mateinfo/logs/app/app.log.11
-rw-r--r-- 1 mateinfo inoc 52584959 May 16 14:50 /opt/mateinfo/logs/app/app.log.12
-rw-r--r-- 1 mateinfo inoc 52494350 May 16 14:50 /opt/mateinfo/logs/app/app.log.13
-rw-r--r-- 1 mateinfo inoc 52613385 May 16 14:43 /opt/mateinfo/logs/app/app.log.14
-rw-r--r-- 1 mateinfo inoc 52509690 May 16 14:43 /opt/mateinfo/logs/app/app.log.15
-rw-r--r-- 1 mateinfo inoc 52550752 May 16 15:15 /opt/mateinfo/logs/app/app.log.2
-rw-r--r-- 1 mateinfo inoc 52482734 May 16 15:15 /opt/mateinfo/logs/app/app.log.3
-rw-r--r-- 1 mateinfo inoc 52503382 May 16 15:15 /opt/mateinfo/logs/app/app.log.4
-rw-r--r-- 1 mateinfo inoc 52467621 May 16 15:15 /opt/mateinfo/logs/app/app.log.5
-rw-r--r-- 1 mateinfo inoc 52480734 May 16 15:06 /opt/mateinfo/logs/app/app.log.6
-rw-r--r-- 1 mateinfo inoc 52571672 May 16 15:04 /opt/mateinfo/logs/app/app.log.7
-rw-r--r-- 1 mateinfo inoc 52513605 May 16 15:04 /opt/mateinfo/logs/app/app.log.8
-rw-r--r-- 1 mateinfo inoc 52529654 May 16 14:59 /opt/mateinfo/logs/app/app.log.9

who can tell me how to solve this issue ?


(Carlos Pérez Aradros) #2

Hi @can.zhang,

If I get it right, filebeat should not be tracking app.log.*, as you are only configuring /opt/mateinfo/logs/app/app.log path, can you share your full filebeat.yml? Have you considered using a more recent version of filebeat?

Cheers


(Can Zhang) #3
  • # Paths that should be crawled and fetched. Glob based paths.
    # To fetch all ".log" files from a specific level of subdirectories
    # /var/log/*/*.log can be used.
    # For each file found under this path, a harvester is started.
    # Make sure not file is defined twice as this can lead to unexpected behaviour.
    paths:
      - /opt/mateinfo/logs/app/app.log
      - /opt/mateinfo/logs/app/app/app.log
      #- c:\programdata\elasticsearch\logs\*
    
    #encoding: plain
    
    # Type of the files. Based on this the way the file is read is decided.
    # The different types cannot be mixed in one prospector
    #
    # Possible options are:
    # * log: Reads every line of the log file (default)
    # * stdin: Reads the standard in
    input_type: log
    
    # Exclude lines. A list of regular expressions to match. It drops the lines that are
    # matching any regular expression from the list. The include_lines is called before
    # exclude_lines. By default, no lines are dropped.
    # exclude_lines: ["^DBG"]
    
    # Include lines. A list of regular expressions to match. It exports the lines that are
    # matching any regular expression from the list. The include_lines is called before
    # exclude_lines. By default, all the lines are exported.
    # include_lines: ["^ERR", "^WARN"]
    
    # Exclude files. A list of regular expressions to match. Filebeat drops the files that
    # are matching any regular expression from the list. By default, no files are dropped.
    # exclude_files: [".gz$"]
    
    # Optional additional fields. These field can be freely picked
    # to add additional information to the crawled log files for filtering
    fields:
      log_format: wfm_log4j
    #  review: 1
    
    # Set to true to store the additional fields as top level fields instead
    # of under the "fields" sub-dictionary. In case of name conflicts with the
    # fields added by Filebeat itself, the custom fields overwrite the default
    # fields.
    fields_under_root: true
    
    # Ignore files which were modified more then the defined timespan in the past.
    # In case all files on your system must be read you can set this value very large.
    # Time strings like 2h (2 hours), 5m (5 minutes) can be used.
    ignore_older: 1m
    
    # Close older closes the file handler for which were not modified
    # for longer then close_older
    # Time strings like 2h (2 hours), 5m (5 minutes) can be used.
    close_older: 10m
    
    # Type to be published in the 'type' field. For Elasticsearch output,
    # the type defines the document type these entries should be stored
    # in. Default: log
    document_type: app_log
    
    # Scan frequency in seconds.
    # How often these files should be checked for changes. In case it is set
    # to 0s, it is done as often as possible. Default: 10s
    #scan_frequency: 10s
    
    # Defines the buffer size every harvester uses when fetching the file
    #harvester_buffer_size: 16384
    
    #max_bytes: 10485760
    
    # Mutiline can be used for log messages spanning multiple lines. This is common
    # for Java Stack Traces or C-Line Continuation
    #multiline:
    
      # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
      #pattern: ^\[
    
      # Defines if the pattern set under pattern should be negated or not. Default is false.
      #negate: false
    
      #match: after
    
      # The maximum number of lines that are combined to one event.
      # In case there are more the max_lines the additional lines are discarded.
      # Default is 500
      #max_lines: 500
    
      # After the defined timeout, an multiline event is sent even if no new pattern was found to start a new event
      # Default is 5s.
      #timeout: 5s
    
    # Setting tail_files to true means filebeat starts readding new files at the end
    # instead of the beginning. If this is used in combination with log rotation
    # this can mean that the first entries of a new file are skipped.
    #tail_files: false
    
    #backoff: 1s
    
    # Max backoff defines what the maximum backoff time is. After having backed off multiple times
    # from checking the files, the waiting time will never exceed max_backoff idenependent of the
    # backoff factor. Having it set to 10s means in the worst case a new line can be added to a log
    # file after having backed off multiple times, it takes a maximum of 10s to read the new line
    #max_backoff: 10s
    
    # The backoff factor defines how fast the algorithm backs off. The bigger the backoff factor,
    # the faster the max_backoff value is reached. If this value is set to 1, no backoff will happen.
    # The backoff value will be multiplied each time with the backoff_factor until max_backoff is reached
    #backoff_factor: 2
    
    # This option closes a file, as soon as the file name changes.
    # This config option is recommended on windows only. Filebeat keeps the files it's reading open. This can cause
    # issues when the file is removed, as the file will not be fully removed until also Filebeat closes
    # the reading. Filebeat closes the file handler after ignore_older. During this time no new file with the
    # same name can be created. Turning this feature on the other hand can lead to loss of data
    # on rotate files. It can happen that after file rotation the beginning of the new
    # file is skipped, as the reading starts at the end. We recommend to leave this option on false
    # but lower the ignore_older value to release files faster.
    #force_close_files: false
    

    Additional prospector

    #-
    # Configuration to use stdin input
    #input_type: stdin

    General filebeat configuration options

    Event count spool threshold - forces network flush if exceeded

    #spool_size: 2048

    Enable async publisher pipeline in filebeat (Experimental!)

    #publish_async: false

    Defines how often the spooler is flushed. After idle_timeout the spooler is

    Flush even though spool_size is not reached.

    #idle_timeout: 5s

    Name of the registry file. Per default it is put in the current working

    directory. In case the working directory is changed after when running

    filebeat again, indexing starts from the beginning again.

    #registry_file: .filebeat


(Can Zhang) #4

force_close_files: false

i want to force_close_files:true,but his config option is recommended on windows only.


(Carlos Pérez Aradros) #5

I think you are hitting an old bug in 1.X branch, could you give a try to a more recent version? Our latest GA release is 5.4.0


(Can Zhang) #6

I would like to know what cause this issue?
And I will test the recent version filebeat .


(Carlos Pérez Aradros) #7

You are most likely affected by an old bug that kept rotated files open after they are renamed (from app.log to app.log.N). This doesn't happen in newer versions


(Can Zhang) #8

ok ,I am goint to try a recent version. Thank you.


(Can Zhang) #9

now,I test the filebeat of 5.4.0 version .

filebeat.yml config is

filebeat.prospectors:

paths:
- /opt/mateinfo/logs/app/servicecreator.log
- /opt/mateinfo/logs/servicecreator/app/servicecreator.log
input_type: log
fields:
log_format: wfm_log4j
fields_under_root: true
close_renamed: true
document_type: servicecreator_log

paths:
- /opt/mateinfo/logs/app/ws.log
- /opt/mateinfo/logs/ws/app/ws.log
input_type: log
fields:
log_format: wfm_log4j
fields_under_root: true
close_renamed: true
document_type: ws_log

  • paths:
    • /opt/mateinfo/logs/app/app.log
    • /opt/mateinfo/logs/app/app/app.log
      input_type: log
      fields:
      log_format: wfm_log4j
      fields_under_root: true
      ignore_older: 1m
      close_renamed: true
      document_type: app_log

But,my issue is is not resolved.

-bash-4.2$ lsof |grep filebeat|grep app.log
filebeat 16916 mateinfo 5r REG 202,32 52471372 262151 /opt/mateinfo/logs/app/app.log.3
filebeat 16916 mateinfo 8r REG 202,32 52503400 262156 /opt/mateinfo/logs/app/app.log.2
filebeat 16916 mateinfo 10r REG 202,32 49726925 262158 /opt/mateinfo/logs/app/app.log


(Can Zhang) #10

Do you know the root cause of this issue?

the log file is rotated too fast ?


(ruflin) #11

If you see the same issue with 5.4 I assume filebeat is not done sending the events so it doesn't close the file. Could you share your filebeat log file? I assume either we see there quite a few errors transmitting data or the remote instance not keeping up with the number of events happening. How many events do you have per second?


(Can Zhang) #12

How to send the filebeat log to you?


(Can Zhang) #13

about 25,000 rows per second at the peak.


(ruflin) #14

For the filebeat log: Best put it into a gist and link it here. What output are you using? How does your setup look like?


(Can Zhang) #15

filebeat log
https://s3-ap-southeast-1.amazonaws.com/sgbetakibana/filebeat.log.tgz


(Can Zhang) #16

my elk structure


(ruflin) #17

You have about ~25k events per 30s send on the output side. That makes only <1000 events per second to LS which is rather slow. There can be different reasons for that like network, LS, ES. How many filebeat instances do you have sending to your LS instance?

Based on the above it makes a lot of sense that filebeat is keeping the files open as your setup does not keep up with the number of events sent. I doubt the limitation here is filebeat itself. You should investigate where the bottleneck in your architecture is. Check what network bandwidth you have, how many events you can get through 1 LS instance and your ES cluster. Perhaps you need to add bigger LS instances or add more?


(Can Zhang) #18

There are about 20 filebeat instances .
I doubt the limitation here is filebeat itself. ?
How many events per second the filebeat can handle?


(ruflin) #19

How many events filebeat can handle depends on lots of factors like disk speed, network, cpu, multiline config and others. Best is to take one of your machines with a big log file and write the events to disk which is assumed to be faster then the network. Then you see how many events you will reach. I assume the number is quite a bit bigger then what you see above.


(system) #20

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.