MetricBeat v7.3.1 memory leak

Hi,
We are using ELK 7.3.1 installed on Kubernetes cluster. We are using official Docker image for Metricbeat in this case that is: docker.elastic.co/beats/metricbeat-oss:7.3.1

Ever since migration to this new version, Metricbeat is constantly showing increase in memory usage, without releasing memory resources. Once it reaches limit it is restarted by Kubernetes. Does anyone else has this problem and how would it be possible to resolve this?
Thanks

1 Like

Hi @Milan_Todorovic :slightly_smiling_face:

Can you share the configuration you were using and the modules that were activated? There are no differences in the code on the OSS version, only some modules are not included, which should not impact performance.

For processor, we are using add_docker_metadata.
We also set cleanup_timeout: 1s.
For modules, we are using: logstash, elasticsearch, jolokia, docker and kubernetes modules, all sending data in interval of 30s.
Jolokia module is under autodiscovery like:
metricbeat.autodiscover:
providers:
- type: kubernetes
...
Thx for any idea.

Some logs will be useful. Also more information about how much is this memory limit. Modules like Jolokia can take a huge amount of memory depending of how many metrics you have configured. The same with the Docker module (it's dependent on the amount of containers).

Can you provide some metrics about this? I'm not sure if the cleanup_timeout parameter is too agressive. Have you tried rising it a bit to 10 seconds for example? Just to check of the memory behaves like this.

By the way, have you tried, just to try, to use the Metricbeat shipped with the basic license. Just to check if that solves the problem (it shouldn't, but just in case)

Logs does not contains any unusual information out of ordinary. Also this "memory leak" is not sudden jump. It takes days to reach limits of our defined memory. Upper limit in this case is 400MB. On our previous version of 6.8 we did not experience this problem and metricbeat used 200MB constantly. Problem only emerged after migration to version 7.3.1.
Also we have small amount of container that really can't overrun metricbeat.

2019-10-30T01:41:01.195-0700 INFO instance/beat.go:606 Home path: [/usr/share/metricbeat] Config path: [/usr/share/metricbeat] Data path: [/usr/share/metricbeat/data] Logs path: [/usr/share/metricbeat/logs]
2019-10-30T01:41:01.197-0700 INFO instance/beat.go:614 Beat ID: b041c16f-b921-4375-b7ac-ee950472db40
2019-10-30T01:41:01.494-0700 INFO [seccomp] seccomp/seccomp.go:124 Syscall filter successfully installed
2019-10-30T01:41:01.495-0700 INFO [beat] instance/beat.go:902 Beat info {"system_info": {"beat": {"path": {"config": "/usr/share/metricbeat", "data": "/usr/share/metricbeat/data", "home": "/usr/share/metricbeat", "logs": "/usr/share/metricbeat/logs"}, "type": "metricbeat", "uuid": "b041c16f-b921-4375-b7ac-ee950472db40"}}}
2019-10-30T01:41:01.495-0700 INFO [beat] instance/beat.go:911 Build info {"system_info": {"build": {"commit": "a4be71b90ce3e3b8213b616adfcd9e455513da45", "libbeat": "7.3.1", "time": "2019-08-19T19:20:02.000Z", "version": "7.3.1"}}}
2019-10-30T01:41:01.495-0700 INFO [beat] instance/beat.go:914 Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":88,"version":"go1.12.4"}}}
2019-10-30T01:41:01.503-0700 INFO [beat] instance/beat.go:918 Host info {"system_info": {"host": {"architecture":"x86_64","boot_time":"2019-05-20T02:24:20-07:00","containerized":false,"name":"metricbeat-mbpm4","ip":["127.0.0.1/8","10.244.5.192/24"],"kernel_version":"4.19.15-1.1.el7.x86_64","mac":["0a:58:0a:f4:05:c0"],"os":{"family":"redhat","platform":"centos","name":"CentOS Linux","version":"7 (Core)","major":7,"minor":6,"patch":1810,"codename":"Core"},"timezone":"PDT","timezone_offset_sec":-25200}}}
2019-10-30T01:41:01.503-0700 INFO [beat] instance/beat.go:947 Process info {"system_info": {"process": {"capabilities": {"inheritable":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"permitted":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"effective":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"bounding":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"ambient":null}, "cwd": "/usr/share/metricbeat", "exe": "/usr/share/metricbeat/metricbeat", "name": "metricbeat", "pid": 1, "ppid": 0, "seccomp": {"mode":"filter","no_new_privs":true}, "start_time": "2019-10-30T01:41:00.140-0700"}}}
2019-10-30T01:41:01.503-0700 INFO instance/beat.go:292 Setup Beat: metricbeat; Version: 7.3.1
2019-10-30T01:41:01.503-0700 INFO [index-management] idxmgmt/std.go:178 Set output.elasticsearch.index to 'metricbeat-7.3.1' as ILM is enabled.
2019-10-30T01:41:01.504-0700 INFO elasticsearch/client.go:170 Elasticsearch url: http://elasticsearch:9200
2019-10-30T01:41:01.504-0700 INFO [publisher] pipeline/module.go:97 Beat name: metricbeat-mbpm4
2019-10-30T01:41:01.505-0700 INFO kubernetes/util.go:86 kubernetes: Using pod name metricbeat-mbpm4 and namespace default to discover kubernetes node
2019-10-30T01:41:01.610-0700 INFO kubernetes/util.go:93 kubernetes: Using node <our_node> discovered by in cluster pod node query
2019-10-30T01:41:01.611-0700 INFO kubernetes/util.go:86 kubernetes: Using pod name metricbeat-mbpm4 and namespace default to discover kubernetes node
2019-10-30T01:41:01.626-0700 INFO kubernetes/util.go:93 kubernetes: Using node <our_node> discovered by in cluster pod node query
2019-10-30T01:41:01.628-0700 WARN [cfgwarn] kubernetes/kubernetes.go:55 BETA: The kubernetes autodiscover is beta
2019-10-30T01:41:01.629-0700 INFO kubernetes/util.go:86 kubernetes: Using pod name metricbeat-mbpm4 and namespace default to discover kubernetes node
2019-10-30T01:41:01.699-0700 INFO kubernetes/util.go:93 kubernetes: Using node <our_node> discovered by in cluster pod node query
2019-10-30T01:41:01.699-0700 INFO [monitoring] log/log.go:118 Starting metrics logging every 30s
2019-10-30T01:41:01.699-0700 INFO instance/beat.go:421 metricbeat start running.
2019-10-30T01:41:01.699-0700 INFO [autodiscover] autodiscover/autodiscover.go:105 Starting autodiscover manager
2019-10-30T01:41:01.700-0700 INFO kubernetes/watcher.go:182 kubernetes: Performing a resource sync for *v1.PodList
2019-10-30T01:41:02.199-0700 INFO kubernetes/watcher.go:198 kubernetes: Resource sync done
2019-10-30T01:41:02.199-0700 INFO kubernetes/watcher.go:242 kubernetes: Watching API for resource events
2019-10-30T01:41:02.796-0700 INFO pipeline/output.go:95 Connecting to backoff(elasticsearch(http://elasticsearch:9200))
2019-10-30T01:41:02.802-0700 INFO elasticsearch/client.go:743 Attempting to connect to Elasticsearch version 7.3.1
2019-10-30T01:41:03.002-0700 INFO template/load.go:169 Existing template will be overwritten, as overwrite is enabled.
2019-10-30T01:41:03.919-0700 INFO template/load.go:108 Try loading template metricbeat-7.3.1 to Elasticsearch
2019-10-30T01:41:04.264-0700 INFO template/load.go:100 template with name 'metricbeat-7.3.1' loaded.
2019-10-30T01:41:04.264-0700 INFO [index-management] idxmgmt/std.go:289 Loaded index template.
2019-10-30T01:41:04.265-0700 INFO pipeline/output.go:105 Connection to backoff(elasticsearch(http://elasticsearch:9200)) established
2019-10-30T01:41:05.687-0700 INFO kubernetes/watcher.go:182 kubernetes: Performing a resource sync for *v1.PodList
2019-10-30T01:41:05.693-0700 INFO kubernetes/watcher.go:198 kubernetes: Resource sync done
2019-10-30T01:41:05.694-0700 INFO kubernetes/watcher.go:242 kubernetes: Watching API for resource events
2019-10-30T01:41:06.739-0700 INFO kubernetes/watcher.go:182 kubernetes: Performing a resource sync for *v1.PodList
2019-10-30T01:41:06.745-0700 INFO kubernetes/watcher.go:198 kubernetes: Resource sync done
2019-10-30T01:41:06.746-0700 INFO kubernetes/watcher.go:242 kubernetes: Watching API for resource events
2019-10-30T01:41:10.563-0700 INFO kubernetes/watcher.go:182 kubernetes: Performing a resource sync for *v1.NodeList
2019-10-30T01:41:10.578-0700 INFO kubernetes/watcher.go:198 kubernetes: Resource sync done
2019-10-30T01:41:10.578-0700 INFO kubernetes/watcher.go:242 kubernetes: Watching API for resource events
2019-10-30T01:41:31.705-0700 INFO [monitoring] log/log.go:145 Non-zero metrics in the last 30s

Can you try upgrading to 7.4? It seems it was a known issue and it's solved now:

From the link you provided me with, it seams that this is planed fix for metricbeat 7.5, and not 7.4 version. Am I correct on this? Thanks

7.4.2, 7.3.3 and 7.5.0 yes :slightly_smiling_face:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.