Auditbeat memory leak

Hi All,

I was wondering if anyone else has noticed a memory leak with the last few version of Auditbeat?

Currently running 7.10.0 (and had the issue in 7.8 and 7.9 as well) and I am seeing Auditbeat constantly being OOM killed every day or so. I'm only seeing it on relatively high traffic DNS servers. The DNS servers in question receive between 400-100 queries per second which doesn't seem like anything that should be causing Auditbeat to consume so much resources.

Here is the config I'm running:

#==========================  Modules configuration =============================
auditbeat.modules:

- module: auditd
  resolve_ids: true
  failure_mode: silent
  backlog_limit: 8192
  rate_limit: 0
  include_raw_message: false
  include_warnings: false
  backpressure_stratgey: auto
  # Load audit rules from separate files. Same format as audit.rules(7).
  audit_rule_files: [ '${path.config}/audit.rules.d/*.conf' ]
  audit_rules: |
    ## Define audit rules here.
    ## Create file watches (-w) or syscall audits (-a or -A). Uncomment these
    ## examples or add your own rules.

    ## If you are on a 64 bit platform, everything should be running
    ## in 64 bit mode. This rule will detect any use of the 32 bit syscalls
    ## because this might be a sign of someone exploiting a hole in the 32
    ## bit API.
    -a always,exit -F arch=b32 -S all -F key=32bit-abi

    ## Executions.
    -a always,exit -F arch=b64 -S execve,execveat -k exec

    ## External access (warning: these can be expensive to audit).
    ##-a always,exit -F arch=b64 -S accept,bind,connect -F key=external-access
    ##-a always,exit -F arch=b64 -S accept,bind -F key=external-access

    ## Identity changes.
    -w /etc/group -p wa -k identity
    -w /etc/passwd -p wa -k identity
    -w /etc/gshadow -p wa -k identity
    -w /etc/shadow -p wa -k identity

    ## Unauthorized access attempts.
    -a always,exit -F arch=b32 -S open,creat,truncate,ftruncate,openat,open_by_handle_at -F exit=-EACCES -k access
    -a always,exit -F arch=b32 -S open,creat,truncate,ftruncate,openat,open_by_handle_at -F exit=-EPERM -k access
    -a always,exit -F arch=b64 -S open,creat,truncate,ftruncate,openat,open_by_handle_at -F exit=-EACCES -k access
    -a always,exit -F arch=b64 -S open,creat,truncate,ftruncate,openat,open_by_handle_at -F exit=-EPERM -k access

- module: file_integrity
  paths:
  - /bin
  - /usr/bin
  - /sbin
  - /usr/sbin
  - /etc
  - /root
  - /usr/local/bin
  - /home
  exclude_files:
  - '(?i)\.sw[nop]$'
  - '~$'
  - '/\.git($|/)'
  - '\.rrd$'
  include_files: []
  scan_at_start: true
  scan_rate_per_sec: 50 MiB
  max_file_size: 100 MiB
  hash_types: [md5,sha256]
  recursive: true

- module: system
  datasets:
    - host    # General host information, e.g. uptime, IPs
    - login   # User logins, logouts, and system boots.
    - package # Installed, updated, and removed packages
    - process # Started and stopped processes
    - socket  # Opened and closed sockets
    - user    # User information

  # How often datasets send state updates with the
  # current state of the system (e.g. all currently
  # running processes, all open sockets).
  state.period: 12h

  # Enabled by default. Auditbeat will read password fields in
  # /etc/passwd and /etc/shadow and store a hash locally to
  # detect any changes.
  user.detect_password_changes: true

  # File patterns of the login record files.
  login.wtmp_file_pattern: /var/log/wtmp*
  login.btmp_file_pattern: /var/log/btmp*


#================================ Outputs =====================================

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["<snipped>","<snipped>","<snipped>"]
  loadbalance: true

#================================ Processors =====================================

# Configure processors to enhance or manipulate events generated by the beat.

processors:
  - add_host_metadata: ~
  - add_tags:
      tags: [auditbeat]
  - dns:
      type: reverse
      fields:
        server.ip: server.hostname
        client.ip: client.hostname
        source.ip: source.hostname
        destination.ip: destination.hostname
      nameservers: ['<snipped>','<snipped>','<snipped>']
      tag_on_failure: [_dns_reverse_lookup_failed]
#  - add_cloud_metadata: ~
#  - add_docker_metadata: ~

#================================ Logging =====================================

logging.level: info
logging.to_files: true
logging.files:
  path: /var/log/auditbeat
  name: auditbeat
  keepfiles: 2
  permissions: 0600
  rotateeverybytes: 5242880

#============================== X-Pack Monitoring ===============================

monitoring.enabled: true
monitoring.elasticsearch:
  hosts: ["<snipped>","<snipped>","<snipped>"]
  protocol: "https"
  username: "<snipped>"
  password: "<snipped>"
  ssl.enabled: true
  ssl.verification_mode: full
  ssl.certificate_authorities: ["<snipped>"]
monitoring.cluster_uuid: "<snipped>"

The DNS servers are CentOS 8:

# cat /etc/os-release
NAME="CentOS Linux"
VERSION="8 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="8"

Auditbeat will slowly eat RAM over time, until Linux will OOM kill it, and the process requires a restart.

Below is a screen shot of it occurring:

If you can capture a memory profile when it gets high that will probably help figure out what is leaking. If you add --httpprof=localhost:5060 to the service's arguments then this will allow you to make a curl http://localhost:5060/debug/pprof/heap > heap.bin request to collect a profile that we can analyze with go tool pprof -web hap.bin.

After you collect the profile I think I would try disabling the DNS correlation for the system.socket dataset with socket.dns.enabled: false in the system module settings. https://www.elastic.co/guide/en/beats/auditbeat/current/auditbeat-dataset-system-socket.html

- module: system
  datasets:
    - socket
    - ...
  socket.dns.enabled: false

Thanks, I've added the argument to one of the servers and will wait for the memory to start being used again.

@andrewkroh I wasn't able to upload the .bin file here. I've attached a screenshot of the graph instead.

I looked over the source code. It may not be a leak, but just a large cache that is constantly full of new DNS transactions. After 30s any DNS metadata is removed from the cache. It's probably best to turn the feature off for a DNS server.

It captures packets with udp src port 53 that have A or AAAA dns answers. Then when the host makes a "connection" to one of those IPs it enriches that connection event with the hostname from the DNS packet. For a DNS server this will capture a lot more packets.

@andrewkroh After letting Auditbeat run for a bit with socket.dns.enabled: false , I am no longer seeing the memory issues. Thanks for the help. For my future reference/understanding, is this something that should be done for all DNS server?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.