Filebeat consuming high I/O operations for smaller files

diptesh2007 · June 24, 2022, 11:18am

I have the following architecture:

Files (*.json) of ~1-5 KB gets written to a folder throughout the day where file beat is scanning with a frequency of 10s.
A json file once written in the file system will never be updated.

Below is the configuration:

filebeat.inputs:
  
- type: log
  multiline.pattern: '^\{'
  multiline.negate: true
  multiline.match: after
  multiline.max_lines: 1000000
  multiline.timeout: 10s
  tail_files: false
  
  enabled: true
  paths:
    -  /opt/JSON/20220615/*.json 
  fields:
    type: json
  fields_under_root: true  
  # This option is to uniquely identify a file by filebeat.
  # In our use case a file will never be updated after creation and files will be stored in network share system. 
  # So a file path can be used as unique identifier in our case. Below configuration option serves that purpose.
  file_identity.path: ~  
  ### Harvester Options
  harvester_limit: 4096
  ignore_older: 1h
  scan_frequency: 10s  
  ### Harvester closing options
  close_eof: true
  ### State options
  clean_inactive: 2h

Following are few questions which I need few inputs:

Is it ideal to have too many small files for filebeat to scan or it is advised to append the content into a single json files of ~10MB and rotate ? With the current scheme and configuration, I am observing high I/O operation.
Even though the ignore_older is set to 1h, does it implies that for all files which are already indexed and older than 1h, the file handler will be opened for these to identify whether it has undergone any change and then the file handler is closed or whether the file changed information is available from the inode itself?
Any other optimization is recommended?

diptesh2007 · June 27, 2022, 6:56am

Can anyone suggest if I am missing something?

diptesh2007 · June 29, 2022, 9:40am

Requesting for a feedback on this

jsoriano · July 4, 2022, 12:56pm

I would say that it is better to collect from less files, as there is less overhead in the system. If you can control this, try to log to less files. But you may end up having similar problems if the file rotates too frequently.

With ignore_older: 1h, any file whose last modification happened more than one hour ago is going to be ignored and no file handler is going to be open for them. It doesn't matter if it has been indexed or not.
If the file was being read, you also need to use close_inactive to ensure that the file handler is closed.
You can read more about this setting in Log input | Filebeat Reference [8.3] | Elastic

Have you tried to increase registry.flush? This controls with what frequency filebeat persists its state on disk. Increasing this value may help when you have many files.

diptesh2007 · July 12, 2022, 5:57pm

Have you tried to increase registry.flush ? This controls with what frequency filebeat persists its state on disk. Increasing this value may help when you have many files.

This is currently set as

filebeat.registry.flush: 60s

With ignore_older: 1h , any file whose last modification happened more than one hour ago is going to be ignored and no file handler is going to be open for them. It doesn't matter if it has been indexed or not.

Does the file modification time of a file resides in inode which is why the filebeat is able to filter out the older files without actually opening the file handler?

jsoriano · July 14, 2022, 9:40am

Yes, Filebeat relies on the filesystem to obtain the modification time. It can be checked without actually opening the file.

diptesh2007 · July 17, 2022, 11:24am

Yes, Filebeat relies on the filesystem to obtain the modification time. It can be checked without actually opening the file

If that is the case, when filebeat is running with large number of input files, I wonder why my EFS burst I/O credit balance is decreasing rapidly. Any pointers?

But you may end up having similar problems if the file rotates too frequently.

What is the standard recommendation to rotate the log file? Size based rotation or time based rotation?

diptesh2007 · July 20, 2022, 5:41am

Any thoughts on the standard rotation policies?

diptesh2007 · August 4, 2022, 7:59am

Trying to keep this thread alive in case someone would like to suggest the recommended polcies

system · September 1, 2022, 10:00am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Monitor a set of JSON files in Filebeat Beats filebeat	2	963	December 5, 2016
Tuning for harvesting a large number of files Beats filebeat	1	657	September 3, 2019
Filebeat configuration's specific value Beats filebeat	3	1018	August 27, 2018
Filebeat High Memory and CPU usage Beats filebeat	2	3021	March 14, 2019
Filebeat CPU Load and Log File Size Issues Beats filebeat	1	244	December 13, 2023

Filebeat consuming high I/O operations for smaller files

Related topics