I have metricbeat installed on a Windows Server 2016 Datacenter server that also has an Elasticsearch node.
I also have metricbeat installed on a Windows Server 2012 Standard server with an Elasticsearch node as well in the same Elasticsearch cluster as the 2016 server.
Both on Elastic Stack 7.1.1.
Only the system module is enabled and configured identically on both servers:
# Module: system
# Docs: https://www.elastic.co/guide/en/beats/metricbeat/7.1/metricbeat-module-system.html
- module: system
period: 10s
metricsets:
- cpu
#- load
- memory
- network
- process
- process_summary
- socket_summary
#- core
#- diskio
#- socket
process.include_top_n:
by_cpu: 20 # include top 20 processes by CPU
by_memory: 20 # include top 20 processes by memory
- module: system
period: 1m
metricsets:
- filesystem
- fsstat
processors:
- drop_event.when.regexp:
system.filesystem.mount_point: '^/(sys|cgroup|proc|dev|etc|host|lib)($|/)'
- module: system
period: 15m
metricsets:
- uptime
#- module: system
# period: 5m
# metricsets:
# - raid
# raid.mount_point: '/'
At this point metricbeat has been running for around 1 week, and the memory usage on the 2016 server is > 3GB:
On the 2012 server it is significantly lower:
Any clue as to why this is happening only on the 2016 server?
Area chart of the memory usage in the past week:
I too have the same issue with every 7.x Metricbeat I've tried. The only thing I've changed it the elasticsearch host from the default .yaml files. I did try changing the check period from 10s to 60s and that causes a much slower increase in RAM usage over time.
I increased the period for the first metricset from 10s to 60s and it does appear to slow down the growth:
Top graph is the 2016 server, bottom graph is the 2012 server.
I've installed Metricbeat 7.1.1 on a Windows Server 2016 Standard server (QA-MS2018) that has nothing running on it aside from serving as a Hyper-V host and it appears to also exhibit the same behavior. In under 24 hours it has grown to over 500MB.
Server 2016 running Metricbeat 7.1.1, that's over the last 7 days. Have multiple hosts exhibiting the same problem, the drop is where I killed the service, not sure that it would have come right by itself but can't just wait around for that to happen on that server.
Edit: this time series shows only the Metricbeat process memory utilization, the server itself reached critical levels of memory utilization which alerted me to check and then end the process. Seems to just keep consuming and not releasing memory until there's none left.
Edit: this time series shows the WmiPrvSE.exe process using huge amounts of CPU on Windows Server 2008 R2 running Metricbeat 7.1.1. This is consistent across all 20 or so of these servers that I'm running and did not happen with Metricbeat 6.5.4 with the same config settings. This usage falls away entirely once Metricbeat service is stopped. So it looks like Metricbeat 7.1.1 also has a problem on Windows Server 2008 R2 except instead of consuming huge amounts of RAM, it thrashes the WMI process. I have not altered collection interval or what elements are monitored between Metricbeat versions. That's a LOT of CPU for a monitoring service to use all by itself - it's not good to monitor a server if the monitoring application itself causes the server resource exhaustion.
Note: this graph shows ONLY the WmiPrvSE.exe process by itself.
This below shows the total CPU over that same period.
Here's Windows Server 2012 R2 server that also receives high CPU utilization from Metricbeat 7.1.1, also did not have this problem when running 6.5.4, also did not alter config details. Graph below shows those two processes stacked as a % of total CPU.
Preliminary results from running metricbeat on a Windows Server 2019 VM and a Windows 10 machine, that also has Elasticsearch node installed, for the last 24 hours shows no noticeable increase in memory consumption.
This issue may just be isolated to Windows Server 2016.
I configured one of the Windows Server 2016 servers to only send the process metricset and it appears to exhibit the same growth rate as when I had the default metricsets enabled.
# Module: system
# Docs: https://www.elastic.co/guide/en/beats/metricbeat/7.1/metricbeat-module-system.html
- module: system
period: 10s
metricsets:
#- cpu
#- load
#- memory
#- network
- process
#- process_summary
#- socket_summary
#- core
#- diskio
#- socket
process.include_top_n:
by_cpu: 20 # include top 5 processes by CPU
by_memory: 20 # include top 5 processes by memory
- module: system
period: 1m
metricsets:
- filesystem
- fsstat
processors:
- drop_event.when.regexp:
system.filesystem.mount_point: '^/(sys|cgroup|proc|dev|etc|host|lib)($|/)'
- module: system
period: 15m
metricsets:
- uptime
#- module: system
# period: 5m
# metricsets:
# - raid
# raid.mount_point: '/'
Scaled to the number of cores, it doesn't appear to be eating too much CPU time... though these are servers with a high number of CPU cores.
QA-MS2018 has 24 cores/48 threads
QA-DM-HQS-2012 has 12 cores/24 threads.
DEV-AP-2016-DC has 8 cores/8 threads.
When I use system.process.cpu.total.pct instead of system.process.cpu.total.norm.pct it does appear to use quite a lot:
The initial portion was when I had the interval set to 10s. When I noticed the high CPU usage of WmiPrvSE.exe, I changed the interval to 60s and it seems to have lowered the usage slightly.
Actual available memory reported by Metricbeat 5.0-beta1 is a bit misleading from the user POV.
If you analyze the JSON data on ES (please see the attached screenshot) you can see that values of system.memory.free, system.memory.total and system.memory.
Thanks [wisdomgt], I'll give that a test in my environment in the next week and see what happens with Server 2012 R2 and Server 2008 R2 and their CPU usage. In the meantime have rolled back to Metricbeat 6.5.4 since that was nice and reliable.
Edit: From the release notes, 7.2 looks like they fixed both problems of CPU use and memory leak....
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.