Memory leak on kubernetes nodes

dimaz · December 25, 2024, 2:19pm

Hi, I see memory issue when i'm using Auditbeat on kubernetes nodes.
I am trying with Auditbeat 8.12 and 8.13.
Here is the config I'm running:

auditbeat.modules:

- module: auditd
  processors:
    - add_session_metadata:
        backend: "auto"
    - add_docker_metadata:
    - drop_event:
        when:
          or:
            - has_fields: ['container']
            - contains:
                process.entry_leader.entry_meta.type: "container"
            - contains:
                process.entry_leader.args: "containerd"

  audit_rules: |
    -a exit,always -F arch=b64 -F euid=0 -S execve -k rootact
    -a exit,always -F arch=b32 -F euid=0 -S execve -k rootact
    -a always,exit -F arch=b64 -S connect -F a2=16 -F success=1 -F key=network_connect_4
    -a always,exit -F arch=b64 -F exe=/bin/bash -F success=1 -S connect -k "remote_shell"
    -a always,exit -F arch=b64 -F exe=/usr/bin/bash -F success=1 -S connect -k "remote_shell"

Auditbeat eat RAM over time, until Linux will OOM kill it, and the process requires a restart.

A sharp decrease in memory consumption on the graph is associated with restarting the auditbeat service or restarting the server.

Thanks in advance.

Michael_Wolf · December 30, 2024, 8:33pm

Hi,

I think the large memory usage is likely being caused by the add_session_metadata processor. It will track all process forks and executions in memory when its used. The memory usage should be bounded, but it can grow quite large.

There was a large architectural redesign of the add_session_metadata processor in 8.16 that may help with this. Could you upgrade to Auditbeat 8.16 or higher? Or if you're not using the data provided by this processor, you could remove it from the config.

dimaz · December 31, 2024, 7:24am

Yes, I also believe it is caused by the add_session_metadata handler.
We tried with 8.16 and 8.17 it didn't help.

We use it to exclude events inside containers from the audit. Can you recommend other ways to ensure that only events caused only by the processes of the k8s node are included in the audit, excluding containers?

haesbaert · January 7, 2025, 1:15pm

Can you confirm it is indeed the add_session_metadata processor?
Running one instance without the processor for awhile should be enough to see if the pattern changes.

After that, can you get a memory profile, you have to set http.pprof.enabled, Configure an HTTP endpoint for metrics | Auditbeat Reference [8.17] | Elastic
Just be sure to get a dump when the memory usage is high enough.

dimaz · January 7, 2025, 6:33pm

Yes, I disabled the add_session_metadata processor and the memory leak stopped.

Alex_Kristiansen · January 9, 2025, 7:52pm

@dimaz can you try adding -a always,exit -F arch=b64 -S exit_group to your audit rules?

@Michael_Wolf It looks like if we don't get any exit_group events, the processor DB instance won't reap any dead processes?

dimaz · January 14, 2025, 6:48am

I added -a always, exit -F arch=b64 -S exit_group and it had no effect on the memory leak.
Here is the memory profile that I got by setting http.prof.enabled.

master node:

{
"auditd": {
"kernel_lost": 31435,
"reassembler_seq_gaps": 884763805174,
"received_msgs": 1492935,
"userspace_lost": 0
},
"beat": {
"cgroup": {
"cpu": {
"cfs": {
"period": {
"us": 100000
},
"quota": {
"us": 0
}
},
"id": "auditbeat.service",
"stats": {
"periods": 0,
"throttled": {
"ns": 0,
"periods": 0
}
}
},
"cpuacct": {
"id": "auditbeat.service",
"total": {
"ns": 341370860453
}
},
"memory": {
"id": "auditbeat.service",
"mem": {
"limit": {
"bytes": 9223372036854771712
},
"usage": {
"bytes": 1127354368
}
}
}
},
"cpu": {
"system": {
"ticks": 46590,
"time": {
"ms": 46590
}
},
"total": {
"ticks": 341360,
"time": {
"ms": 341360
},
"value": 341360
},
"user": {
"ticks": 294770,
"time": {
"ms": 294770
}
}
},
"handles": {
"limit": {
"hard": 262144,
"soft": 262144
},
"open": 26
},
"info": {
"ephemeral_id": "13f39bfc-d91f-43f0-9a95-21636139775d",
"name": "auditbeat",
"uptime": {
"ms": 190182143
},
"version": "8.17.0"
},
"memstats": {
"gc_next": 40572544,
"memory_alloc": 26293464,
"memory_sys": 79088195,
"memory_total": 56586289080,
"rss": 1112170496
},
"runtime": {
"goroutines": 27
}
},
"libbeat": {
"config": {
"module": {
"running": 0,
"starts": 0,
"stops": 0
},
"reloads": 0,
"scans": 0
},
"output": {
"batches": {
"split": 0
},
"events": {
"acked": 316014,
"active": 0,
"batches": 30604,
"dead_letter": 0,
"dropped": 0,
"duplicates": 0,
"failed": 23,
"toomany": 0,
"total": 316037
},
"read": {
"bytes": 1648146,
"errors": 1
},
"type": "logstash",
"write": {
"bytes": 126110743,
"errors": 1,
"latency": {
"histogram": {
"count": 0,
"max": 0,
"mean": 0,
"median": 0,
"min": 0,
"p75": 0,
"p95": 0,
"p99": 0,
"p999": 0,
"stddev": 0
}
}
}
},
"pipeline": {
"clients": 1,
"events": {
"active": 2,
"dropped": 0,
"failed": 0,
"filtered": 0,
"published": 316015,
"retry": 53,
"total": 316016
},
"queue": {
"acked": 316014,
"added": {
"bytes": 0,
"events": 316015
},
"consumed": {
"bytes": 0,
"events": 316014
},
"filled": {
"bytes": 0,
"events": 1,
"pct": 0.00048828125
},
"max_bytes": 0,
"max_events": 2048,
"removed": {
"bytes": 0,
"events": 316014
}
}
}
},
"metricbeat": {
"auditd": {
"auditd": {
"consecutive_failures": 0,
"events": 316017,
"failures": 116,
"success": 315903
}
}
},
"system": {
"cpu": {
"cores": 2
},
"load": {
"1": 0.34,
"15": 0.28,
"5": 0.27,
"norm": {
"1": 0.17,
"15": 0.14,
"5": 0.135
}
}
}
}

worker node:

{
"auditd": {
"kernel_lost": 22594,
"reassembler_seq_gaps": 5793911785175,
"received_msgs": 5855068,
"userspace_lost": 0
},
"beat": {
"cgroup": {
"cpu": {
"cfs": {
"period": {
"us": 100000
},
"quota": {
"us": 0
}
},
"id": "auditbeat.service",
"stats": {
"periods": 0,
"throttled": {
"ns": 0,
"periods": 0
}
}
},
"cpuacct": {
"id": "auditbeat.service",
"total": {
"ns": 1031281329977
}
},
"memory": {
"id": "auditbeat.service",
"mem": {
"limit": {
"bytes": 9223372036854771712
},
"usage": {
"bytes": 5337092096
}
}
}
},
"cpu": {
"system": {
"ticks": 139370,
"time": {
"ms": 139370
}
},
"total": {
"ticks": 1031270,
"time": {
"ms": 1031270
},
"value": 1031270
},
"user": {
"ticks": 891900,
"time": {
"ms": 891900
}
}
},
"handles": {
"limit": {
"hard": 262144,
"soft": 262144
},
"open": 58
},
"info": {
"ephemeral_id": "057fbe51-89e4-4f37-8bff-6663e94dd299",
"name": "auditbeat",
"uptime": {
"ms": 318090029
},
"version": "8.17.0"
},
"memstats": {
"gc_next": 48628040,
"memory_alloc": 32350416,
"memory_sys": 127224387,
"memory_total": 136483867096,
"rss": 5384351744
},
"runtime": {
"goroutines": 29
}
},
"libbeat": {
"config": {
"module": {
"running": 0,
"starts": 0,
"stops": 0
},
"reloads": 0,
"scans": 0
},
"output": {
"batches": {
"split": 0
},
"events": {
"acked": 852111,
"active": 0,
"batches": 54461,
"dead_letter": 0,
"dropped": 0,
"duplicates": 0,
"failed": 97,
"toomany": 0,
"total": 852208
},
"read": {
"bytes": 2701408,
"errors": 2
},
"type": "logstash",
"write": {
"bytes": 382956376,
"errors": 3,
"latency": {
"histogram": {
"count": 0,
"max": 0,
"mean": 0,
"median": 0,
"min": 0,
"p75": 0,
"p95": 0,
"p99": 0,
"p999": 0,
"stddev": 0
}
}
}
},
"pipeline": {
"clients": 1,
"events": {
"active": 5,
"dropped": 0,
"failed": 0,
"filtered": 24376,
"published": 852115,
"retry": 258,
"total": 876492
},
"queue": {
"acked": 852111,
"added": {
"bytes": 0,
"events": 852115
},
"consumed": {
"bytes": 0,
"events": 852111
},
"filled": {
"bytes": 0,
"events": 4,
"pct": 0.001953125
},
"max_bytes": 0,
"max_events": 2048,
"removed": {
"bytes": 0,
"events": 852111
}
}
}
},
"metricbeat": {
"auditd": {
"auditd": {
"consecutive_failures": 0,
"events": 876493,
"failures": 959,
"success": 875538
}
}
},
"system": {
"cpu": {
"cores": 10
},
"load": {
"1": 0.1,
"15": 0.12,
"5": 0.1,
"norm": {
"1": 0.01,
"15": 0.012,
"5": 0.01
}
}
}
}

Alex_Kristiansen · January 14, 2025, 8:08pm

Alright, I can't reproduce the full leak, but I suspect that there's some kind of behavior going on here that's system dependent.

@dimaz

The docs mention setting a number of audit rules specifically for this processor, can you make sure they're all set?

    ## executions
    -a always,exit -F arch=b64 -S execve,execveat -k exec
    -a always,exit -F arch=b64 -S exit_group
    ## set_sid
    -a always,exit -F arch=b64 -S setsid

Also, can you tell us more about the environment you're running on? Is this a K8s cloud service? Is there a host you can run uname -a on and give us the output?

dimaz · January 15, 2025, 9:21am

We don't have a rule
-a always,exit -F arch=b64 -S setsid
I'll add it.

These are our own on-premises Kubernetes servers.

# uname -a
Linux hostname.example.com 5.15.0-210.163.7.el8uek.x86_64

When server activity is low, memory leakage occurs slowly and increases with increasing load.

Alex_Kristiansen · January 15, 2025, 10:21pm

So, the good news is that I think I've reproduced this. I think that different components are just keying off of different values in the database that the processor uses. Still haven't found a workaround.

Alex_Kristiansen · January 16, 2025, 9:02pm

Currently working on a fix.

Kseniya_Schwarz · March 24, 2025, 1:25pm

Hi all! I have a related question.

While we are waiting for a fix, we have limited auditbeat memory usage with cgroups. We have noticed that when auditbeat is low on memory, disk IO usage starts to increase significantly though swap is off and no tools like systemd-swap are used. Using 'top' we have noticed a heavy work of kswapd. As a result the work of our servers can be interrupted.

Could you please kindly clarify why this happens and if this situation can be avoided?

Alex_Kristiansen · March 24, 2025, 3:18pm

For reference, the fix for the memory leak issue should be available in 8.19.

Have you been able toe verify that auditbeat is the process using swap memory, you can check by running /proc/[pid of auditbeat]/status | grep VmSwap

Kseniya_Schwarz · March 28, 2025, 7:33am

Thanks, currently VmSwap is 0 everywhere, we will try to check again when the memory consumption starts growing. We will also check v8.19.

dimaz · June 26, 2025, 8:37am

Could you please provide an update on when I can expect the memory leak problem to be fixed?

Thank you for your assistance.

haesbaert · June 26, 2025, 8:55am

@Alex_Kristiansen fixed it in February here: Handle leak of process info in `hostfs` provider for `add_session_met… · elastic/beats@d6ff82b · GitHub

landguladze · August 4, 2025, 7:08am

Hi everybody,

We built it from the commit d6ff82b. It's 9.1.0 version. But is does not support add_session_metadata.

This is the full error message:
auditbeat[3320949]: Exiting: the processor action add_session_metadata does not exist. Valid actions: add_fields, add_network_direction, append, community_id, decode_xml, dissect, truncate_fields, add_kubernetes_metadata, add_labels, decompress_gzip_field, add_observer_metadata, decode_duration, decode_base64_field, decode_json_fields, drop_event, include_fields, add_host_metadata, dns, extract_array, uppercase, add_locale, add_process_metadata, decode_xml_wineventlog, registered_domain, syslog, convert, add_formatted_index, add_tags, lowercase, rename, replace, add_docker_metadata, add_id, translate_ldap_attribute, urldecode, copy_fields, drop_fields, add_cloud_metadata, rate_limit, detect_mime_type, fingerprint, move_fields, script

Is there any version of auditbeat which supports add_session_metadata and also has fixed memory leak?

dimaz · October 8, 2025, 11:25am

Hello,

We have upgraded Auditbeat on our servers (some nodes to 8.19.4, others to 9.1.4), but the memory consumption issue still persists.

Auditbeat memory usage continues to grow over time until we manually restart the service. After restart, memory drops, but then starts increasing again.

To illustrate, I created a script that records Auditbeat memory usage (RSS).

2025-09-29 10:00:01 - Memory (RSS): 144884 KB
2025-09-29 11:00:01 - Memory (RSS): 171300 KB
2025-09-29 12:00:01 - Memory (RSS): 194336 KB
2025-09-29 13:00:01 - Memory (RSS): 215848 KB
2025-09-29 14:00:02 - Memory (RSS): 243052 KB
2025-09-29 15:00:01 - Memory (RSS): 260788 KB
2025-09-29 16:00:01 - Memory (RSS): 285976 KB
2025-09-29 17:00:01 - Memory (RSS): 309796 KB
2025-09-29 18:00:01 - Memory (RSS): 331676 KB
2025-09-29 19:00:01 - Memory (RSS): 349796 KB
...
2025-10-01 22:00:01 - Memory (RSS): 973280 KB
2025-10-01 23:00:01 - Memory (RSS): 983836 KB
2025-10-02 00:00:01 - Memory (RSS): 991660 KB
2025-10-02 01:00:01 - Memory (RSS): 999792 KB
2025-10-02 02:00:01 - Memory (RSS): 1007556 KB
2025-10-02 03:00:01 - Memory (RSS): 1018028 KB
2025-10-02 04:00:01 - Memory (RSS): 1024764 KB
2025-10-02 05:00:01 - Memory (RSS): 1034000 KB
2025-10-02 06:00:02 - Memory (RSS): 1040332 KB
2025-10-02 07:00:01 - Memory (RSS): 1048348 KB
2025-10-02 08:00:01 - Memory (RSS): 1057640 KB
2025-10-02 09:00:01 - Memory (RSS): 1067136 KB
...
2025-10-08 05:00:01 - Memory (RSS): 1725652 KB
2025-10-08 06:00:01 - Memory (RSS): 1730048 KB
2025-10-08 07:00:01 - Memory (RSS): 1733272 KB
2025-10-08 08:00:01 - Memory (RSS): 1734024 KB
2025-10-08 09:00:01 - Memory (RSS): 1738004 KB
2025-10-08 10:00:02 - Memory (RSS): 1744084 KB
2025-10-08 11:00:01 - Memory (RSS): 1744612 KB

Memory usage steadily increases until restart.

Additionally, when we configure the module with a backlog limit, for example:

- module: auditd
  backlog_limit: 2048

we observe that as memory consumption approaches the limit, disk I/O load increases significantly:

Could you please confirm if this is a known regression or if there are additional fixes planned beyond 8.19? Is there any recommended workaround to prevent Auditbeat from consuming excessive memory and causing high disk load until OOM kill or manual restart?

Thank you in advance.

Topic		Replies	Views
Auditbeat memory leak in 8.13.0 Beats auditbeat	25	793	May 9, 2024
Auditbeat [7.11.2 and 7.12.0] memory issue Beats auditbeat	8	646	April 29, 2021
[Auditbeat] Memory leak in 7.5.2 Beats	20	1558	November 4, 2022
Linear increase cpu/memory utilization on auditbeat 7.15.0 or later Beats docker , auditbeat	1	891	January 19, 2022
Auditbeat - 120% CPU? Beats auditbeat	30	3676	August 1, 2020

Memory leak on kubernetes nodes

Related topics