Filebeat: Excessive disk usage ( ~15M/s) when no line is matched in the input log file


(Rishi Dev Singh) #1

Hi,

I am using filebeat 6.2.0 vesrion. In one of my run I found filebeat is writing to the disk (~15Mbps) when there is no matching line found in the log file (size of file = 500 MB).

filebeat.prospectors:
- type: log
  enabled: true
  paths:
    -  /work/filebeatTesting/dummyGenerator/13_exception_1/*/LOGS/*.log
  include_lines: ['\n']
  exclude_files: ['^STATISTICS','stderr.log']
  multiline.pattern: ^[0-9]
  multiline.negate: true
  multiline.match: after

fields:
   env: finaldemo_test
   log_type: application
   client_name: rand
output.logstash:
  hosts: ["172.28.0.166:5044"]

Any lead will be helpful.

Thanks and Regards


(ruflin) #2

Any chance you could share your log file and even enable debug logging. I suspect either there is something written to the log file or the registry is updated very frequently.


(Rishi Dev Singh) #3

Hi Ruflin,

Thanks for your feedback. I also suspect that the registry file is being updated frequently which in turns increases the filebeat disk usage. As from the log it seems after reading every non matched line the filebeat writes into the registry file.

2018-02-27T06:30:10.402Z|DEBUG|[registrar]|registrar/registrar.go:200|Processing 1 events|
2018-02-27T06:30:10.404Z|DEBUG|[registrar]|registrar/registrar.go:193|Registrar states cleaned up. Before: 1, After: 1|
2018-02-27T06:30:10.404Z|DEBUG|[registrar]|registrar/registrar.go:228|Write registry file: /var/lib/filebeat/registry|
2018-02-27T06:30:10.402Z|DEBUG|[harvester]|log/harvester.go:362|Drop line as it does not match any of the include patterns 11:27:54:000683|0420-00058:JMIX: MQSubscription {INFO} Action {MQ supply received} QueueName {NA.XTP_MQ_XTP_ACCT_HIERARCHY.account_master_standard.0_accounting_manager_standard_2c8128c8_2b09_43e1_a4b4_5c04f04608cb} MkvMQAction {ADD} Record {NA.XTP_MQ_XTP_ACCT_HIERARCHY.account_master_standard.GTMHCL030} Id {GTMHCL030} Account {GTMHCL030} Salesman {GREENSAL} Location {GREENLOC} Firm {GREEN} Thread {MkvConnection(router_m@127.0.0.1:19500)}|

Please let me know the location where i can upload the logs. The result of iotop shows that almost filebeat is consuming 15-18 M/s.

Total DISK READ: 254.61 K/s | Total DISK WRITE: 18.10 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
 1019 be/3 root        0.00 B/s    7.96 K/s  0.00 % 69.52 % [jbd2/nvme0n1p1-]
 8855 be/4 root        0.00 B/s 1834.00 K/s  0.00 % 26.53 % filebeat -c /etc/filebeat/filebeat.yml -path.home /usr/share/filebeat -path.config /etc/filebeat -path.data /var/lib/filebeat -path.logs /var/log/filebeat
 8852 be/4 root      127.31 K/s   83.54 K/s  0.00 %  0.00 % filebeat -c /etc/filebeat/filebeat.yml -path.home /usr/share/filebeat -path.config /etc/filebeat -path.data /var/lib/filebeat -path.logs /var/log/filebeat
 8857 be/4 root        0.00 B/s   91.50 K/s  0.00 %  0.00 % filebeat -c /etc/filebeat/filebeat.yml -path.home /usr/share/filebeat -path.config /etc/filebeat -path.data /var/lib/filebeat -path.logs /var/log/filebeat
 8859 be/4 root        0.00 B/s   51.72 K/s  0.00 %  0.00 % filebeat -c /etc/filebeat/filebeat.yml -path.home /usr/share/filebeat -path.config /etc/filebeat -path.data /var/lib/filebeat -path.logs /var/log/filebeat
 8866 be/4 root      127.31 K/s   27.85 K/s  0.00 %  0.00 % filebeat -c /etc/filebeat/filebeat.yml -path.home /usr/share/filebeat -path.config /etc/filebeat -path.data /var/lib/filebeat -path.logs /var/log/filebeat

Please suggest me is there any setting so that I can reduce the frequency of updating the registry file to the disk.

Thanks and Regards


(Rishi Dev Singh) #4

Hi Ruflin,

By further debugging and diving into the code, I am almost sure that this issue is being happened because of the updating of registry file frequently.
Will you please provide me the setting used to set the RegistryFlush timeout duration.

Thanks and Regard


(Rishi Dev Singh) #5

Hi,

I have found one setting that controls the flush time out of the registry file.

filebeat.registry_flush: 100s

Please suggest me is it safe to use this setting in the production where the number of filetered event would be very small(~.01%) compared to the overall events.

Thanks and Regards


(ruflin) #6

Yes, that is the setting I just wanted you to recommend using. I would probably set it to a lower value like 1-2s. 100s also work as long as your machine does not crash. In case of a crash it could happen that all data from the last 100s is resent as the registry was not written yet with the most recent state info.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.