High CPU usage in BEATS

I am using Filebeat to ship logs from the server to Elastic. We use ingestNode pipelines to parse the files using Grok processor.
Please find my filebeat.yml file. I see the memory consumed by a single process is around 500MB.

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. 
#=========================== Filebeat inputs =============================

- type: log
  enabled: false
    - /var/log/log.log
    logtype: "xxxx"
  exclude_lines: ['.*INFO.*']
  pipeline: "yyyyyyy"
  index: "%{[fields.logtype]}-%{[beat.version]}-%{+yyyy.MM.dd}"
  multiline.pattern: '^([0-9]{4}-[0-9]{2}-[0-9]{2})'
  multiline.negate: true
  multiline.match: after

- type: log
  enabled: true
     logtype: "xxxxxxx"
  multiline.match: after
  #multiline.pattern: '^[[:space:]]'
  multiline.pattern: '^([0-9]{4}-[0-9]{2}-[0-9]{2})'
  multiline.negate: true
  multiline.match: after
  #tail_files: true
  #exclude_lines: ['.*INFO.*']
  spool_size: 1
  ignore_older: 48h
  #include_lines: ['^ERR', '^WARN']
  ### Multiline options

  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  #multiline.pattern: ^\[

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  #multiline.negate: false

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  # multiline.match: after
  #multiline.pattern: '^\['
  #multiline.negate: true
  #multiline.match: after

#============================= Filebeat modules ===============================

  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

#==================== Elasticsearch template setting ==========================

  index.number_of_shards: 3
  #index.codec: best_compression
  #_source.enabled: false
setup.template.name: "zzzzzzzz"
setup.template.pattern: "yyyyyyy*"
setup.template.overwrite: false
#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#  env: staging

#============================== Dashboards =====================================
# the dashboards is disabled by default and can be enabled either by setting the
# options here, or by using the `-setup` CLI flag or the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.

#============================== Kibana =====================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.

#============================= Elastic Cloud ==================================

# These settings simplify using filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.

#================================ Outputs =====================================

# Configure what output to use when sending the data collected by the beat.

#-------------------------- Elasticsearch output ------------------------------
  # Array of hosts to connect to.
  hosts: ["localhost:9200"]
  #index: "%{[fields.logtype]}-%{[beat.version]}-%{+yyyy.MM.dd}"
    - index: "samplelog-%{[agent.version]}-%{+yyyy.MM.dd}"
        message: "SAMPLE TEXT"
    - index: "samplelog3-%{[agent.version]}-%{+yyyy.MM.dd}"
        message: "ERROR"
    - index: "samplelog2-%{[agent.version]}-%{+yyyy.MM.dd}"
        message: "HIDDEN TEXT"
    - pipeline: "xxxxxx"
         message: "JSON"
    - pipeline: "yyyyy"
         message: "ERROR"
    - pipeline: "zzzzzzz"
         message: "HIDDEN TEXT:"

  # Enabled ilm (beta) to use index lifecycle management instead daily indices.
  #ilm.enabled: false

  # Optional protocol and basic auth credentials.
protocol: "https"
  #username: "elastic"
  #password: "changeme"
#================================ Processors =====================================
  - add_host_metadata: ~
  - add_cloud_metadata: ~

#================================ Logging =====================================
logging.level: debug
logging.selectors: ["*"]
#xpack.monitoring.enabled: false

Sorry you're having trouble! To clarify, is the problem you're seeing with memory use or CPU (or both)? Since you mention the memory hitting 500MB I'll focus on that.

There are a lot of factors that affect memory use, including log throughput / network speed / etc, but I see you're using multiline so you might be hitting a memory leak that affects multiline configurations and was fixed last week (the fix should be in the next release).

Aside from that, one thing you can try to mitigate the problem, especially if it's partially caused by an indexing bottleneck, is to reduce the internal queue size so fewer messages are stored in memory at once, as in these docs (you might try something like queue.mem.events: 1024).

I tried with queue.mem.events: 1024 , but this doesnt help. Memory utilization seems to be the same. The problem is with the memory because whenever i use pmap command to find the memory usage it always shows more than 400MB .

I tried disabling multiline options, but the same problem exists!! :frowning:

Any suggestions please ?

In which version of filebeat does the memory leak has been fixed ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.