Tuning for harvesting a large number of files

btrowbri · August 6, 2019, 6:03pm

Hello,

We are having some difficulty configuring our Filebeat to forward a large number of log files to our System.

Since we are deploying on another tenant's machine we are required to keep the resource consumption to a reasonable level. However with this many files it has become difficult for us to create a configuration that meets that requirement and keeps up with the updates.

Filebeat (and our Elastic Stack) version : 6.2.2
The machine with Filebeat is:
RHEL 5.9
2 CPUs
16 Gb RAM

The logging files:

Data Type 1:

Number of files: 323+
File creation: 1/day
Update rate: 1 row/minute
Total size per file: 80 KB

Data Type 2:

Number of files: 323+
File creation: 1/day
Update rate: 1 row/minute
Total size per file: 55 KB

Data Type 3:

Number of files: 323+
File creation: 1/day
Update rate: ~6 row/10 minutes
Total size per file: 2 KB
Note: This file has strange logging behavior when it updates the file, It is deleted to and then recreated when updating new records. Optionally, to avoid this mess we are able to consume historical logs which are rolled over to at midnight and not updated after that.

Data Type 4:

Number of files: 5
File creation: 1/day
Update rate: Entire file dumped at midnight
Total size per file: 30 KB, 225 KB, 565 KB, 15 KB, 177 KB

All logs are single line/record

The initial configuration file:

filebeat.prospectors:
- ### Data Type 1
  type: log
  encoding: plain
  # enabled: false
  paths:
    - {{dtype1.file_pattern}}
  exclude_files: {{dtype1.exclude_pattern}}
  fields:
    topic: {{dtype1.topic}}
  exclude_lines: [ '^#' ]
  ignore_older: 168h
  harvester_limit: 124
  close_inactive: 1m
- ### Data Type 2
  type: log
  encoding: plain
  # enabled: false
  paths:
    - {{dtype2.file_pattern}}
  fields:
    topic: {{dtype2.topic}}
  exclude_lines: [ '^#' ]
  ignore_older: 168h
  harvester_limit: 124
  close_inactive: 1m
- ### Data Type 3
  type: log
  encoding: plain
  # enabled: false
  paths:
    - {{dtype3.historical.file_pattern}}
  fields:
    topic: {{dtype3.topic}}
  ignore_older: 168h
  harvester_limit: 124
  close_inactive: 1m
- ### Data Type 4, File 1
  type: log
  encoding: plain
  # enabled: false
  paths:
    - {{dtype4.file1.file_pattern}}
  fields:
    topic: {{dtype4.file1.topic}}
  ignore_older: 168h
  harvester_limit: 124
  close_inactive: 1m
- ### Data Type 4, File 2
  type: log
  encoding: plain
  # enabled: false
  paths:
    - {{dtype4.file2.file_pattern}}
  fields:
    topic: {{dtype4.file2.topic}}
  ignore_older: 168h
  harvester_limit: 124
  close_inactive: 1m
- ### Data Type 4, File 3
  type: log
  encoding: plain
  # enabled: false
  paths:
    - {{dtype4.file3.file_pattern}}
  exclude_files: {{dtype4.file3.exclude_pattern}}
  fields:
    topic: {{dtype4.file3.topic}}
  ignore_older: 168h
  harvester_limit: 124
  close_inactive: 1m
- ### Data Type 4, File 4
  type: log
  encoding: plain
  # enabled: false
  paths:
    - {{dtype4.file4.file_patttern}}
  fields:
    topic: {{dtype4.file4.topic}}
  ignore_older: 168h
  harvester_limit: 124
  close_inactive: 1m
- ### Data Type 4, File 5
  type: log
  encoding: plain
  # enabled: false
  paths:
    - {{dtype4.file5.file_patttern}}
  fields:
    topic: {{dtype4.file5.topic}}
  ignore_older: 168h
  harvester_limit: 124
  close_inactive: 1m

output:
  kafka:
    # enabled: false
    hosts: "${KAF_SERVER_ALL}"
    topic: '%{[fields][topic]}'

Any advice on this use case would be helpful.

system · September 3, 2019, 6:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat configuration's specific value Beats filebeat	3	1057	August 27, 2018
Filebeat High Memory and CPU usage Beats filebeat	2	3080	March 14, 2019
Harvester_limit over directory with more 1M files Beats filebeat	3	6393	May 16, 2018
Log file size vs. quantity Beats filebeat	1	487	March 12, 2019
Filebeat: initial ingestion of 2M files Beats filebeat	3	277	September 22, 2021