After long time work filebeat I have no free space on my server.
When I execute /usr/sbin/lsof -p 8095 (PID filrbeat) I see what process have many deleted files:
Under filebeat 1.2.3 it is possible that in case a file is rotate during the scan that a file handler is kept open. Some improvements were made in 1.3 to prevent this issue. For version 5 the file handling part was rewritten and these issue do not exist anymore.
For the version 1.2.3 currently the only fix is if it happened to restart filebeat. This will free all file handler. Obviously this is not a nice solution. How long was filebeat running until the issue happened?
Hi! Filebeat worked 10 minutes.
I try 1.3 and see in debug logs
Launching harvester on new file: /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log. Old file was probably rotated
... number of events you have per second - I don`t know. How I can check?
... how often your files are rotated? - When log is 20M applications change file.
For example:
Sep 1 13:35 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.4
Sep 1 13:37 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.3
Sep 1 13:40 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.2
Sep 1 13:42 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.1
Sep 1 13:43 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log
In config file
prospectors:
paths:
- /var/opt/tibco/logs/*.log
input_type: log
close_older: 5m
document_type: tibco_6
multiline:
pattern: '[BW'
negate: true
match: after
force_close_files: true
Maybe for you will be interesting nohup file from my filebeat?
About number of events: An estimate on how many log lines you created would be good. Just a rough estimate to understand if we talk about a high load project or not.
For file rotation: Kind of related to the previous point: How often is it rotated because it reaches the 20MB? Every 10s, 10 minutes, 1 hour?
Hi! Today tested filebeat-5.0.0-alpha6-SNAPSHOT-linux-x86_64. Same problems. After 10 minutes, they began to appear - messages about deleted files.
One file have about 10K rows. All events write in file VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log. When this file is 20M, he renamed VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.1 and file VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.4 deleted.
below you can see time for log rotate:
Sep 2 14:07 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.4
Sep 2 14:09 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.3
Sep 2 14:11 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.2 - not use
Sep 2 14:14 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.1 - not use
Sep 2 14:14 VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log - using applications
test with lsof:
[tibco]$ /usr/sbin/lsof /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log.4
[tibco]$ /usr/sbin/lsof /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
bwengine 26344 tibco 303w REG 8,17 10276872 639075 /var/opt/tibco/logs/VPartnerGw-ussdtail-VPartnerGw-ussdtail-6.log
filebeat watches catalog LOG, what use for save logs about 30 app. ussdtail is most active
If use lsof for file /usr/sbin/lsof /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log.4
I see what file is not watching filebeat and other process.
Messages from log Filebeat
2016/09/05 10:09:03.757905 prospector_log.go:140: DBG Check file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:03.757916 prospector_log.go:173: DBG Update existing file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log, offset: 150785
2016/09/05 10:09:03.757924 prospector_log.go:223: DBG Harvester for file is still running: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:13.762543 prospector_log.go:140: DBG Check file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:13.762559 prospector_log.go:173: DBG Update existing file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log, offset: 150785
2016/09/05 10:09:13.762572 prospector_log.go:223: DBG Harvester for file is still running: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:23.767274 prospector_log.go:140: DBG Check file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:23.767285 prospector_log.go:173: DBG Update existing file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log, offset: 150785
2016/09/05 10:09:23.767294 prospector_log.go:223: DBG Harvester for file is still running: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:33.771490 prospector_log.go:140: DBG Check file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:33.771501 prospector_log.go:173: DBG Update existing file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log, offset: 150785
2016/09/05 10:09:33.771510 prospector_log.go:223: DBG Harvester for file is still running: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:43.775458 prospector_log.go:140: DBG Check file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:43.775469 prospector_log.go:173: DBG Update existing file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log, offset: 150785
2016/09/05 10:09:43.775477 prospector_log.go:223: DBG Harvester for file is still running: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
2016/09/05 10:09:53.779052 prospector_log.go:140: DBG Check file for harvesting: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log
I tried to reformat your post above. Please use 3 ticks around code blocks to make it more readable.
The indentation for close_removed, close_renamed seems to be off. Not sure if that is a copy / paste problem or it is that way in the config file?
I don't understand the following sentence: "I see what file is not watching filebeat and other process."
For the filebeat log file, I would expect more debug messages with additional information between these outputs. Did you filter out part of it or is that everything from the log file?
The indentation for close_removed, close_renamed seems to be off. Not sure if that is a copy / paste problem or it is that way in the config file?
This is was from config file.
Now I commented this is options. And restart file beat.
The problem persists.
I don't understand the following sentence: "I see what file is not watching filebeat and other process."
If use lsof for file /usr/sbin/lsof /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log.4
I see what filebeat DO NOT use the file /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log.4, he watch only /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log . But in this time if I check
lsof -p PID_FILEBEAT | grep 'deleted'
I have rows with (deleted).
Example: /var/opt/tibco/logs/VPartnerGw-vmeste-VPartnerGw-vmeste-prod6.log.4 (deleted)
2016/09/09 11:11:50.008100 log.go:42: ERR Stop Harvesting. Unexpected file opening error: open /var/opt/tibco/logs/partnergateway-PartnerGatewayConnector-prod6.log: too many open files
It seems like filebeat is potentially opening too many files which could lead to this issue. How many files do you harvest at the same time? What is your ulimit?
An other thing I realised is that filebeat is constantly flushing events. The default for spool_size is 50 which is good to get started and use low memory, but for lots of events this value should be increased. Can you try to set it to filebeat.spool_size: 2048 or even higher?
It seems there are several issue where one leads to an other:
Events are not sent fast enough -> files are not closed because harvesting is blocked by output
Because file are not closed -> too many files are open
Beacuse too many files are open -> errors on the output appear, because for the output also file descriptors are needed: `2016/09/09 12:53:33.051400 single.go:140: ERR Connecting error publishing events (retrying): lookup srv-testapp-001t on 10.254.0.3:53: dial udp 10.254.0.3:53: socket: too many open files``
You should make sure that events can be sent faster from filebeat then new events are created. Increase the recommended value above and check the following lines:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.