Pb of increase memmory with Filebeat agent

Hello everyone,

I installed a filebeat agent on a Windows server 2012 in service mode.

The objective is to examine several directories and when a XML File comes into one of these directories, the filebeat agent sends its contents to a logstash agent.

The problem I encounter is that Filebeat agent has a memory consumption that is growing as and extent of its operation and the arrival of new files.

For example, on a parsing of a directory:

Starting at 12h10: 35 MB Memory Consumption
Stopping the agent to 14 pm: Consumption Memory 243 MB : 1000 files threated (and thus sent to the Logstash agent) and a logging file 14 MB

In production, we wish parser 5 directories with an average of 100,000 files per day (in total) on these 5 directories. The purge ofthese directories is done daily on 5 days old files.

We use the 1.2.3 version of filebeat on Windows server 2012

My conf filebeat is

################### Filebeat Configuration #########################

############################# Filebeat ######################################
filebeat:
  # List of prospectors to fetch data.
  prospectors:
    # Each - is a prospector. Below are the prospector specific configurations
    -
      paths:
        - D:\Data_GED\P8integrator_*\XMLOK\*.xml
        #- \\wse1621\Data_GED\P8integrator_1\XMLOK\*.xml

      input_type: log
      
      include_lines: ["nom_image"]   
      ignore_older: 1m
      close_older: 1m
      idle_timeout: 5s
      
  registry_file: "D:/AgentSupervision/filebeat-1.2.3-windows-metier/filebeatmetier"


############################# Output ##########################################

# Configure what outputs to use when sending the data collected by filebeat.
# You can enable one or multiple outputs by setting enabled option to true.
output:
  logstash:
    enabled: true
    hosts: ["localhost:5044"]

   # hosts: ["10.228.26.125:5044"]
  #console:
   #  Pretty print json event
   # pretty: true
############################# Logging #########################################

logging:
#  selectors: []
#
#  # Rotator config
#  files:
#    path: D:\AgentSupervision\Int
#    name: beat
#    rotateEveryBytes:
#    keepFiles:
#  to_syslog: false
#  to_files: true
#  level: debug

A few questions here:

  • By the logging file size, I assume you refer to the registry file?
  • At 14pm, did filebeat catch up with all the files, means it crawls files almost in real time? I would expect the memory usage to peak when it is first started as it will harvest all files at the same time, but as soon as it finishes files, the memory usage should go down.

Based on the number of files you have, I make the following assumptions:

  • Each file is only updated once with content and then not touched anymore.
  • Files are never rotated
  • Each directory is for one day?

If this is the case, enabling force_close_files could bring some improvements in your case.

What is the size average size of each file which is harvested (just a rough number)?

There is currently one issue we are working on, that the registry file is not cleaned up. This is normally not an issue because files are rotate and only a small number of files are harvested. But in your case it seems like all files are unique and the number of files is quite large. The discussion about how to clean up the registry can be found here: https://github.com/elastic/beats/issues/1600 It would be interesting to hear your take on this.

A general question: What is the kind of logging system you use that produces this kind of log files?

It would be interesting to hear if you see an improved memory handling with the 5.0.0-alpha3 as quite a few improvements under to hood were made for the next major version: https://www.elastic.co/downloads/beats/filebeat Of course, currently this is not yet recommended for production.

This topic was automatically closed after 21 days. New replies are no longer allowed.