Filebeat use (deleted) files


(Andrei Zakabluk) #1

filebeat-1.2.3-x86_64

conf:

filebeat:
  prospectors:
    -
      paths:
        - /var/opt/tibco/logs/*.log
      input_type: log
      close_older: 5m
      document_type: tibco_6
      multiline:
        pattern: '\[BW'
        negate: true
        match: after
      force_close_files: true
output:

  elasticsearch:
    hosts: ["srv-testapp-001t:9200"]

--
OS 2.6.18-274.el5

After long time work filebeat I have no free space on my server.
When I execute /usr/sbin/lsof -p 8095 (PID filrbeat) I see what process have many deleted files:

filebeat 31893 tibco   46r   REG               8,17 20520987    638989 /var/opt/tibco/logs/UssdCs-UssdCs-prod6.log.4 (deleted)
filebeat 31893 tibco   47r   REG               8,17  5512425    638991 /var/opt/tibco/logs/VPartnerGw-mobileapi-VPartnerGw-mobileapi-prod6.log
filebeat 31893 tibco   48r   REG               8,17 20461307    638985 /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.4 (deleted)
filebeat 31893 tibco   49r   REG               8,17 20526952    638984 /var/opt/tibco/logs/UssdCs-UssdCs-prod6.log.4 (deleted)
filebeat 31893 tibco   50r   REG               8,17 20460267    638990 /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.4 (deleted)
filebeat 31893 tibco   51r   REG               8,17 18119756    639002 /var/opt/tibco/logs/partnergateway-PartnerGatewayConnector-prod6.log
filebeat 31893 tibco   52r   REG               8,17  1661592    639015 /var/opt/tibco/logs/VPartnerGw-voka-VPartnerGw-voka-prod6.log
filebeat 31893 tibco   53r   REG               8,17 20461356    638997 /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.4 (deleted)
filebeat 31893 tibco   54r   REG               8,17 20504359    638978 /var/opt/tibco/logs/UssdCs-UssdCs-prod6.log.4 (deleted)
filebeat 31893 tibco   55r   REG               8,17 20532252    638981 /var/opt/tibco/logs/UssdCs-UssdCs-prod6.log.4 (deleted)

I read issues


https://github.com/elastic/beats/issues/1600
but this not help me.


(ruflin) #2

Under filebeat 1.2.3 it is possible that in case a file is rotate during the scan that a file handler is kept open. Some improvements were made in 1.3 to prevent this issue. For version 5 the file handling part was rewritten and these issue do not exist anymore.

For the version 1.2.3 currently the only fix is if it happened to restart filebeat. This will free all file handler. Obviously this is not a nice solution. How long was filebeat running until the issue happened?


(Andrei Zakabluk) #3

Hi! Filebeat worked 10 minutes.
I try 1.3 and see in debug logs
Launching harvester on new file: /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log. Old file was probably rotated

filebeat 2635 tibco   30r   REG   8,17 20473992    639009 /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.4 (deleted)
filebeat 2635 tibco   40r   REG   8,17 20469091    639026 /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.4 (deleted)
filebeat 2635 tibco   41r   REG   8,17 20466790    639084 /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.4 (deleted)
filebeat 2635 tibco   42r   REG   8,17 20471145    638984 /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.4 (deleted)
filebeat 2635 tibco   43r   REG   8,17 20470834    639075 /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.4 (deleted)
filebeat 2635 tibco   45r   REG   8,17 20471937    639089 /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.4
filebeat 2635 tibco   48r   REG   8,17 20465266    639090 /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.3
filebeat 2635 tibco   49r   REG   8,17 20467060    639091 /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.2
filebeat 2635 tibco   51r   REG   8,17  9702675    639093 /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log
filebeat 2635 tibco   52r   REG   8,17 20472383    639092 /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.1

Now will try ver5.


(ruflin) #4

Can you share some details on the number of events you have per second and how often your files are rotated? Thanks for testing the 5.0 release.


(Andrei Zakabluk) #5

Hi. I have same problems

  1. ... number of events you have per second - I don`t know. How I can check?
  2. ... how often your files are rotated? - When log is 20M applications change file.
    For example:
    Sep 1 13:35 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.4
    Sep 1 13:37 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.3
    Sep 1 13:40 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.2
    Sep 1 13:42 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.1
    Sep 1 13:43 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log

In config file
prospectors:
paths:
- /var/opt/tibco/logs/*.log
input_type: log
close_older: 5m
document_type: tibco_6
multiline:
pattern: '[BW'
negate: true
match: after
force_close_files: true
Maybe for you will be interesting nohup file from my filebeat?


(ruflin) #6

@zakabluk When you wrote you have the same problem, you mean you have the same problem with the 5.0 nightly build? https://beats-nightlies.s3.amazonaws.com/index.html?prefix=filebeat/

About number of events: An estimate on how many log lines you created would be good. Just a rough estimate to understand if we talk about a high load project or not.

For file rotation: Kind of related to the previous point: How often is it rotated because it reaches the 20MB? Every 10s, 10 minutes, 1 hour?


(Andrei Zakabluk) #7

Hi! Today tested filebeat-5.0.0-alpha6-SNAPSHOT-linux-x86_64. Same problems. After 10 minutes, they began to appear - messages about deleted files.
One file have about 10K rows. All events write in file VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log. When this file is 20M, he renamed VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.1 and file VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.4 deleted.

below you can see time for log rotate:

Sep 2 14:07 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.4
Sep 2 14:09 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.3
Sep 2 14:11 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.2 - not use
Sep 2 14:14 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.1 - not use
Sep 2 14:14 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log - using applications

test with lsof:
[tibco]$ /usr/sbin/lsof /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.4
[tibco]$ /usr/sbin/lsof /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
bwengine 26344 tibco 303w REG 8,17 10276872 639075 /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log

filebeat watches catalog LOG, what use for save logs about 30 app. ussdtail is most active


(ruflin) #8

Thanks for testing with the snapshot. Can you share the exact messages that show up?

I'm not sure I can follow your lsof check. It seems like filebeat is not watching the .4 file which is good?


(ruflin) #9

BTW: close_older was renamed to close_inactive in 5.0 and force_close_files is now close_renamed and close_removed.


(Andrei Zakabluk) #10

My config;

filebeat:
  prospectors:
    -
      paths:
        - /var/opt/tibco/logs/*.log
      input_type: log
      close_inactive: 15m
      document_type: tibco_6
      multiline:
        pattern: '\[BW'
        negate: true
        match: after
close_renamed: true
close_removed: true

And my problems not disappear.

  1. It seems like filebeat is not watching the .4 file which is good?
    For example file:
    In log filebaet:
 filebeat 15747 tibco  979r   REG      8,17 20433483    639001 /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log.4 (deleted)

File

[server]$ ll /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log.4
-rw-r--r-- 1 tibco tibco 20433909 Sep  5 10:44 /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log.4

If use lsof for file /usr/sbin/lsof /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log.4
I see what file is not watching filebeat and other process.
Messages from log Filebeat

2016/09/05 10:09:03.757905 prospector_log.go:140: DBG  Check file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:03.757916 prospector_log.go:173: DBG  Update existing file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log, offset: 150785
2016/09/05 10:09:03.757924 prospector_log.go:223: DBG  Harvester for file is still running: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:13.762543 prospector_log.go:140: DBG  Check file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:13.762559 prospector_log.go:173: DBG  Update existing file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log, offset: 150785
2016/09/05 10:09:13.762572 prospector_log.go:223: DBG  Harvester for file is still running: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:23.767274 prospector_log.go:140: DBG  Check file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:23.767285 prospector_log.go:173: DBG  Update existing file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log, offset: 150785
2016/09/05 10:09:23.767294 prospector_log.go:223: DBG  Harvester for file is still running: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:33.771490 prospector_log.go:140: DBG  Check file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:33.771501 prospector_log.go:173: DBG  Update existing file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log, offset: 150785
2016/09/05 10:09:33.771510 prospector_log.go:223: DBG  Harvester for file is still running: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:43.775458 prospector_log.go:140: DBG  Check file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:43.775469 prospector_log.go:173: DBG  Update existing file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log, offset: 150785
2016/09/05 10:09:43.775477 prospector_log.go:223: DBG  Harvester for file is still running: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:53.779052 prospector_log.go:140: DBG  Check file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log

(ruflin) #11

I tried to reformat your post above. Please use 3 ticks around code blocks to make it more readable.

  1. The indentation for close_removed, close_renamed seems to be off. Not sure if that is a copy / paste problem or it is that way in the config file?
  2. I don't understand the following sentence: "I see what file is not watching filebeat and other process."
  3. For the filebeat log file, I would expect more debug messages with additional information between these outputs. Did you filter out part of it or is that everything from the log file?

(Andrei Zakabluk) #12

Hi.

  1. The indentation for close_removed, close_renamed seems to be off. Not sure if that is a copy / paste problem or it is that way in the config file?
    This is was from config file.
    Now I commented this is options. And restart file beat.
    The problem persists.

  2. I don't understand the following sentence: "I see what file is not watching filebeat and other process."
    If use lsof for file /usr/sbin/lsof /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log.4
    I see what filebeat DO NOT use the file /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log.4, he watch only /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log . But in this time if I check

lsof -p PID_FILEBEAT | grep 'deleted'
I have rows with (deleted).
Example:
/var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log.4 (deleted)

  1. I am shared file with log from FB(FileBeat) in debug mode.
    https://yadi.sk/d/GfG8WtjUuy3cB
    This is txt file. maybe he help.

(ruflin) #13

I found the following line in your log file:

2016/09/09 11:11:50.008100 log.go:42: ERR Stop Harvesting. Unexpected file opening error: open /var/opt/tibco/logs/partnergateway-PartnerGatewayConnector-prod6.log: too many open files

It seems like filebeat is potentially opening too many files which could lead to this issue. How many files do you harvest at the same time? What is your ulimit?

An other thing I realised is that filebeat is constantly flushing events. The default for spool_size is 50 which is good to get started and use low memory, but for lots of events this value should be increased. Can you try to set it to filebeat.spool_size: 2048 or even higher?

It seems there are several issue where one leads to an other:

  • Events are not sent fast enough -> files are not closed because harvesting is blocked by output
  • Because file are not closed -> too many files are open
  • Beacuse too many files are open -> errors on the output appear, because for the output also file descriptors are needed: `2016/09/09 12:53:33.051400 single.go:140: ERR Connecting error publishing events (retrying): lookup srv-testapp-001t on 10.254.0.3:53: dial udp 10.254.0.3:53: socket: too many open files``

You should make sure that events can be sent faster from filebeat then new events are created. Increase the recommended value above and check the following lines:

2016/09/09 10:37:29.932138 output.go:109: DBG  output worker: publish 50 events

If it constantly sents afterwards still 2048 events you should increase the value even more.

It would be still good to have some values on how many log lines are generated per second.


(system) #14

This topic was automatically closed after 21 days. New replies are no longer allowed.