Memory leak on kubernetes nodes

Hi, I see memory issue when i'm using Auditbeat on kubernetes nodes.
I am trying with Auditbeat 8.12 and 8.13.
Here is the config I'm running:

auditbeat.modules:

- module: auditd
  processors:
    - add_session_metadata:
        backend: "auto"
    - add_docker_metadata:
    - drop_event:
        when:
          or:
            - has_fields: ['container']
            - contains:
                process.entry_leader.entry_meta.type: "container"
            - contains:
                process.entry_leader.args: "containerd"

  audit_rules: |
    -a exit,always -F arch=b64 -F euid=0 -S execve -k rootact
    -a exit,always -F arch=b32 -F euid=0 -S execve -k rootact
    -a always,exit -F arch=b64 -S connect -F a2=16 -F success=1 -F key=network_connect_4
    -a always,exit -F arch=b64 -F exe=/bin/bash -F success=1 -S connect -k "remote_shell"
    -a always,exit -F arch=b64 -F exe=/usr/bin/bash -F success=1 -S connect -k "remote_shell"

Auditbeat eat RAM over time, until Linux will OOM kill it, and the process requires a restart.

A sharp decrease in memory consumption on the graph is associated with restarting the auditbeat service or restarting the server.

Thanks in advance.

Hi,

I think the large memory usage is likely being caused by the add_session_metadata processor. It will track all process forks and executions in memory when its used. The memory usage should be bounded, but it can grow quite large.

There was a large architectural redesign of the add_session_metadata processor in 8.16 that may help with this. Could you upgrade to Auditbeat 8.16 or higher? Or if you're not using the data provided by this processor, you could remove it from the config.

Yes, I also believe it is caused by the add_session_metadata handler.
We tried with 8.16 and 8.17 it didn't help.

We use it to exclude events inside containers from the audit. Can you recommend other ways to ensure that only events caused only by the processes of the k8s node are included in the audit, excluding containers?

Can you confirm it is indeed the add_session_metadata processor?
Running one instance without the processor for awhile should be enough to see if the pattern changes.

After that, can you get a memory profile, you have to set http.pprof.enabled, Configure an HTTP endpoint for metrics | Auditbeat Reference [8.17] | Elastic
Just be sure to get a dump when the memory usage is high enough.

Yes, I disabled the add_session_metadata processor and the memory leak stopped.

@dimaz can you try adding -a always,exit -F arch=b64 -S exit_group to your audit rules?

@Michael_Wolf It looks like if we don't get any exit_group events, the processor DB instance won't reap any dead processes?

I added -a always, exit -F arch=b64 -S exit_group and it had no effect on the memory leak.
Here is the memory profile that I got by setting http.prof.enabled.

master node:

{
"auditd": {
"kernel_lost": 31435,
"reassembler_seq_gaps": 884763805174,
"received_msgs": 1492935,
"userspace_lost": 0
},
"beat": {
"cgroup": {
"cpu": {
"cfs": {
"period": {
"us": 100000
},
"quota": {
"us": 0
}
},
"id": "auditbeat.service",
"stats": {
"periods": 0,
"throttled": {
"ns": 0,
"periods": 0
}
}
},
"cpuacct": {
"id": "auditbeat.service",
"total": {
"ns": 341370860453
}
},
"memory": {
"id": "auditbeat.service",
"mem": {
"limit": {
"bytes": 9223372036854771712
},
"usage": {
"bytes": 1127354368
}
}
}
},
"cpu": {
"system": {
"ticks": 46590,
"time": {
"ms": 46590
}
},
"total": {
"ticks": 341360,
"time": {
"ms": 341360
},
"value": 341360
},
"user": {
"ticks": 294770,
"time": {
"ms": 294770
}
}
},
"handles": {
"limit": {
"hard": 262144,
"soft": 262144
},
"open": 26
},
"info": {
"ephemeral_id": "13f39bfc-d91f-43f0-9a95-21636139775d",
"name": "auditbeat",
"uptime": {
"ms": 190182143
},
"version": "8.17.0"
},
"memstats": {
"gc_next": 40572544,
"memory_alloc": 26293464,
"memory_sys": 79088195,
"memory_total": 56586289080,
"rss": 1112170496
},
"runtime": {
"goroutines": 27
}
},
"libbeat": {
"config": {
"module": {
"running": 0,
"starts": 0,
"stops": 0
},
"reloads": 0,
"scans": 0
},
"output": {
"batches": {
"split": 0
},
"events": {
"acked": 316014,
"active": 0,
"batches": 30604,
"dead_letter": 0,
"dropped": 0,
"duplicates": 0,
"failed": 23,
"toomany": 0,
"total": 316037
},
"read": {
"bytes": 1648146,
"errors": 1
},
"type": "logstash",
"write": {
"bytes": 126110743,
"errors": 1,
"latency": {
"histogram": {
"count": 0,
"max": 0,
"mean": 0,
"median": 0,
"min": 0,
"p75": 0,
"p95": 0,
"p99": 0,
"p999": 0,
"stddev": 0
}
}
}
},
"pipeline": {
"clients": 1,
"events": {
"active": 2,
"dropped": 0,
"failed": 0,
"filtered": 0,
"published": 316015,
"retry": 53,
"total": 316016
},
"queue": {
"acked": 316014,
"added": {
"bytes": 0,
"events": 316015
},
"consumed": {
"bytes": 0,
"events": 316014
},
"filled": {
"bytes": 0,
"events": 1,
"pct": 0.00048828125
},
"max_bytes": 0,
"max_events": 2048,
"removed": {
"bytes": 0,
"events": 316014
}
}
}
},
"metricbeat": {
"auditd": {
"auditd": {
"consecutive_failures": 0,
"events": 316017,
"failures": 116,
"success": 315903
}
}
},
"system": {
"cpu": {
"cores": 2
},
"load": {
"1": 0.34,
"15": 0.28,
"5": 0.27,
"norm": {
"1": 0.17,
"15": 0.14,
"5": 0.135
}
}
}
}

worker node:

{
"auditd": {
"kernel_lost": 22594,
"reassembler_seq_gaps": 5793911785175,
"received_msgs": 5855068,
"userspace_lost": 0
},
"beat": {
"cgroup": {
"cpu": {
"cfs": {
"period": {
"us": 100000
},
"quota": {
"us": 0
}
},
"id": "auditbeat.service",
"stats": {
"periods": 0,
"throttled": {
"ns": 0,
"periods": 0
}
}
},
"cpuacct": {
"id": "auditbeat.service",
"total": {
"ns": 1031281329977
}
},
"memory": {
"id": "auditbeat.service",
"mem": {
"limit": {
"bytes": 9223372036854771712
},
"usage": {
"bytes": 5337092096
}
}
}
},
"cpu": {
"system": {
"ticks": 139370,
"time": {
"ms": 139370
}
},
"total": {
"ticks": 1031270,
"time": {
"ms": 1031270
},
"value": 1031270
},
"user": {
"ticks": 891900,
"time": {
"ms": 891900
}
}
},
"handles": {
"limit": {
"hard": 262144,
"soft": 262144
},
"open": 58
},
"info": {
"ephemeral_id": "057fbe51-89e4-4f37-8bff-6663e94dd299",
"name": "auditbeat",
"uptime": {
"ms": 318090029
},
"version": "8.17.0"
},
"memstats": {
"gc_next": 48628040,
"memory_alloc": 32350416,
"memory_sys": 127224387,
"memory_total": 136483867096,
"rss": 5384351744
},
"runtime": {
"goroutines": 29
}
},
"libbeat": {
"config": {
"module": {
"running": 0,
"starts": 0,
"stops": 0
},
"reloads": 0,
"scans": 0
},
"output": {
"batches": {
"split": 0
},
"events": {
"acked": 852111,
"active": 0,
"batches": 54461,
"dead_letter": 0,
"dropped": 0,
"duplicates": 0,
"failed": 97,
"toomany": 0,
"total": 852208
},
"read": {
"bytes": 2701408,
"errors": 2
},
"type": "logstash",
"write": {
"bytes": 382956376,
"errors": 3,
"latency": {
"histogram": {
"count": 0,
"max": 0,
"mean": 0,
"median": 0,
"min": 0,
"p75": 0,
"p95": 0,
"p99": 0,
"p999": 0,
"stddev": 0
}
}
}
},
"pipeline": {
"clients": 1,
"events": {
"active": 5,
"dropped": 0,
"failed": 0,
"filtered": 24376,
"published": 852115,
"retry": 258,
"total": 876492
},
"queue": {
"acked": 852111,
"added": {
"bytes": 0,
"events": 852115
},
"consumed": {
"bytes": 0,
"events": 852111
},
"filled": {
"bytes": 0,
"events": 4,
"pct": 0.001953125
},
"max_bytes": 0,
"max_events": 2048,
"removed": {
"bytes": 0,
"events": 852111
}
}
}
},
"metricbeat": {
"auditd": {
"auditd": {
"consecutive_failures": 0,
"events": 876493,
"failures": 959,
"success": 875538
}
}
},
"system": {
"cpu": {
"cores": 10
},
"load": {
"1": 0.1,
"15": 0.12,
"5": 0.1,
"norm": {
"1": 0.01,
"15": 0.012,
"5": 0.01
}
}
}
}

Alright, I can't reproduce the full leak, but I suspect that there's some kind of behavior going on here that's system dependent.

@dimaz

The docs mention setting a number of audit rules specifically for this processor, can you make sure they're all set?

    ## executions
    -a always,exit -F arch=b64 -S execve,execveat -k exec
    -a always,exit -F arch=b64 -S exit_group
    ## set_sid
    -a always,exit -F arch=b64 -S setsid

Also, can you tell us more about the environment you're running on? Is this a K8s cloud service? Is there a host you can run uname -a on and give us the output?

We don't have a rule
-a always,exit -F arch=b64 -S setsid
I'll add it.

These are our own on-premises Kubernetes servers.

# uname -a
Linux hostname.example.com 5.15.0-210.163.7.el8uek.x86_64

When server activity is low, memory leakage occurs slowly and increases with increasing load.

So, the good news is that I think I've reproduced this. I think that different components are just keying off of different values in the database that the processor uses. Still haven't found a workaround.

Currently working on a fix.

Hi all! I have a related question.

While we are waiting for a fix, we have limited auditbeat memory usage with cgroups. We have noticed that when auditbeat is low on memory, disk IO usage starts to increase significantly though swap is off and no tools like systemd-swap are used. Using 'top' we have noticed a heavy work of kswapd. As a result the work of our servers can be interrupted.

Could you please kindly clarify why this happens and if this situation can be avoided?

For reference, the fix for the memory leak issue should be available in 8.19.

Have you been able toe verify that auditbeat is the process using swap memory, you can check by running /proc/[pid of auditbeat]/status | grep VmSwap

1 Like

Thanks, currently VmSwap is 0 everywhere, we will try to check again when the memory consumption starts growing. We will also check v8.19.

Could you please provide an update on when I can expect the memory leak problem to be fixed?

Thank you for your assistance.

@Alex_Kristiansen fixed it in February here: Handle leak of process info in `hostfs` provider for `add_session_met… · elastic/beats@d6ff82b · GitHub

Hi everybody,

We built it from the commit d6ff82b. It's 9.1.0 version. But is does not support add_session_metadata.

This is the full error message:
auditbeat[3320949]: Exiting: the processor action add_session_metadata does not exist. Valid actions: add_fields, add_network_direction, append, community_id, decode_xml, dissect, truncate_fields, add_kubernetes_metadata, add_labels, decompress_gzip_field, add_observer_metadata, decode_duration, decode_base64_field, decode_json_fields, drop_event, include_fields, add_host_metadata, dns, extract_array, uppercase, add_locale, add_process_metadata, decode_xml_wineventlog, registered_domain, syslog, convert, add_formatted_index, add_tags, lowercase, rename, replace, add_docker_metadata, add_id, translate_ldap_attribute, urldecode, copy_fields, drop_fields, add_cloud_metadata, rate_limit, detect_mime_type, fingerprint, move_fields, script

Is there any version of auditbeat which supports add_session_metadata and also has fixed memory leak?

Hello,

We have upgraded Auditbeat on our servers (some nodes to 8.19.4, others to 9.1.4), but the memory consumption issue still persists.

Auditbeat memory usage continues to grow over time until we manually restart the service. After restart, memory drops, but then starts increasing again.

To illustrate, I created a script that records Auditbeat memory usage (RSS).

2025-09-29 10:00:01 - Memory (RSS): 144884 KB
2025-09-29 11:00:01 - Memory (RSS): 171300 KB
2025-09-29 12:00:01 - Memory (RSS): 194336 KB
2025-09-29 13:00:01 - Memory (RSS): 215848 KB
2025-09-29 14:00:02 - Memory (RSS): 243052 KB
2025-09-29 15:00:01 - Memory (RSS): 260788 KB
2025-09-29 16:00:01 - Memory (RSS): 285976 KB
2025-09-29 17:00:01 - Memory (RSS): 309796 KB
2025-09-29 18:00:01 - Memory (RSS): 331676 KB
2025-09-29 19:00:01 - Memory (RSS): 349796 KB
...
2025-10-01 22:00:01 - Memory (RSS): 973280 KB
2025-10-01 23:00:01 - Memory (RSS): 983836 KB
2025-10-02 00:00:01 - Memory (RSS): 991660 KB
2025-10-02 01:00:01 - Memory (RSS): 999792 KB
2025-10-02 02:00:01 - Memory (RSS): 1007556 KB
2025-10-02 03:00:01 - Memory (RSS): 1018028 KB
2025-10-02 04:00:01 - Memory (RSS): 1024764 KB
2025-10-02 05:00:01 - Memory (RSS): 1034000 KB
2025-10-02 06:00:02 - Memory (RSS): 1040332 KB
2025-10-02 07:00:01 - Memory (RSS): 1048348 KB
2025-10-02 08:00:01 - Memory (RSS): 1057640 KB
2025-10-02 09:00:01 - Memory (RSS): 1067136 KB
...
2025-10-08 05:00:01 - Memory (RSS): 1725652 KB
2025-10-08 06:00:01 - Memory (RSS): 1730048 KB
2025-10-08 07:00:01 - Memory (RSS): 1733272 KB
2025-10-08 08:00:01 - Memory (RSS): 1734024 KB
2025-10-08 09:00:01 - Memory (RSS): 1738004 KB
2025-10-08 10:00:02 - Memory (RSS): 1744084 KB
2025-10-08 11:00:01 - Memory (RSS): 1744612 KB

Memory usage steadily increases until restart.

Additionally, when we configure the module with a backlog limit, for example:

- module: auditd
  backlog_limit: 2048

we observe that as memory consumption approaches the limit, disk I/O load increases significantly:

Could you please confirm if this is a known regression or if there are additional fixes planned beyond 8.19? Is there any recommended workaround to prevent Auditbeat from consuming excessive memory and causing high disk load until OOM kill or manual restart?

Thank you in advance.