Tuning for harvesting a large number of files

Hello,

We are having some difficulty configuring our Filebeat to forward a large number of log files to our System.

Since we are deploying on another tenant's machine we are required to keep the resource consumption to a reasonable level. However with this many files it has become difficult for us to create a configuration that meets that requirement and keeps up with the updates.

Filebeat (and our Elastic Stack) version : 6.2.2
The machine with Filebeat is:
RHEL 5.9
2 CPUs
16 Gb RAM

The logging files:

Data Type 1:

  • Number of files: 323+
  • File creation: 1/day
  • Update rate: 1 row/minute
  • Total size per file: 80 KB

Data Type 2:

  • Number of files: 323+
  • File creation: 1/day
  • Update rate: 1 row/minute
  • Total size per file: 55 KB

Data Type 3:

  • Number of files: 323+
  • File creation: 1/day
  • Update rate: ~6 row/10 minutes
  • Total size per file: 2 KB
  • Note: This file has strange logging behavior when it updates the file, It is deleted to and then recreated when updating new records. Optionally, to avoid this mess we are able to consume historical logs which are rolled over to at midnight and not updated after that.

Data Type 4:

  • Number of files: 5
  • File creation: 1/day
  • Update rate: Entire file dumped at midnight
  • Total size per file: 30 KB, 225 KB, 565 KB, 15 KB, 177 KB

All logs are single line/record

The initial configuration file:

filebeat.prospectors:
- ### Data Type 1
  type: log
  encoding: plain
  # enabled: false
  paths:
    - {{dtype1.file_pattern}}
  exclude_files: {{dtype1.exclude_pattern}}
  fields:
    topic: {{dtype1.topic}}
  exclude_lines: [ '^#' ]
  ignore_older: 168h
  harvester_limit: 124
  close_inactive: 1m
- ### Data Type 2
  type: log
  encoding: plain
  # enabled: false
  paths:
    - {{dtype2.file_pattern}}
  fields:
    topic: {{dtype2.topic}}
  exclude_lines: [ '^#' ]
  ignore_older: 168h
  harvester_limit: 124
  close_inactive: 1m
- ### Data Type 3
  type: log
  encoding: plain
  # enabled: false
  paths:
    - {{dtype3.historical.file_pattern}}
  fields:
    topic: {{dtype3.topic}}
  ignore_older: 168h
  harvester_limit: 124
  close_inactive: 1m
- ### Data Type 4, File 1
  type: log
  encoding: plain
  # enabled: false
  paths:
    - {{dtype4.file1.file_pattern}}
  fields:
    topic: {{dtype4.file1.topic}}
  ignore_older: 168h
  harvester_limit: 124
  close_inactive: 1m
- ### Data Type 4, File 2
  type: log
  encoding: plain
  # enabled: false
  paths:
    - {{dtype4.file2.file_pattern}}
  fields:
    topic: {{dtype4.file2.topic}}
  ignore_older: 168h
  harvester_limit: 124
  close_inactive: 1m
- ### Data Type 4, File 3
  type: log
  encoding: plain
  # enabled: false
  paths:
    - {{dtype4.file3.file_pattern}}
  exclude_files: {{dtype4.file3.exclude_pattern}}
  fields:
    topic: {{dtype4.file3.topic}}
  ignore_older: 168h
  harvester_limit: 124
  close_inactive: 1m
- ### Data Type 4, File 4
  type: log
  encoding: plain
  # enabled: false
  paths:
    - {{dtype4.file4.file_patttern}}
  fields:
    topic: {{dtype4.file4.topic}}
  ignore_older: 168h
  harvester_limit: 124
  close_inactive: 1m
- ### Data Type 4, File 5
  type: log
  encoding: plain
  # enabled: false
  paths:
    - {{dtype4.file5.file_patttern}}
  fields:
    topic: {{dtype4.file5.topic}}
  ignore_older: 168h
  harvester_limit: 124
  close_inactive: 1m

output:
  kafka:
    # enabled: false
    hosts: "${KAF_SERVER_ALL}"
    topic: '%{[fields][topic]}'

Any advice on this use case would be helpful.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.