High CPU usage on warms caused by metricbeat

Jesbourne · October 5, 2020, 10:15pm

We've been troubleshooting an issue for several months that seems to be related to metricbeat. This weekend, within a few hours of starting metricbeat on our warms, hots, percolators and clients (we do not run it on the masters) we saw a large increase in CPU usage on the warms.

This started at 8pm on a Friday, so we know it's not related to traffic increases. CPU usage continued be bottlenecked on the warms at around 100% used until the ES url went down completely Sunday night, right as we requested some automated backups and moves from hot to warm.

Every time I've tried to start metricbeat in the last few weeks we've had an unexplainable outage with ES within a few days. The symptoms aren't quite the same each time. Sometimes we see an increase in CPU to our ingestion service instead, which causes a different type of outage, but CPU usage seems to be a common thread. Stopping metricbeat stabilizes the service every time. Is there a different way we can configure metricbeat that is less risky and won't can't cause outages?

We were running metricbeat on our masters previously, until it caused high CPU usage on the masters themselves, and now we get these stats from a node in our cluster that isn't running ES. This was working well for several months until now.

We are on ES and metricbeat/kibana 7.5.2. This is our /etc/metricbeat/metricbeat.yml, removing some sensitive data with XXXXX -

output.elasticsearch:
  hosts: ["es-monitoring.XXXXX.com:9200"]
metricbeat.config.modules:
  path: ${path.config}/modules.d/*.yml
setup.template.overwrite: true
setup.ilm.overwrite: true
setup.ilm.policy_file: "/etc/metricbeat/metricbeat-ilm-policy.conf"
setup.template.settings:
  index.number_of_shards: 1
  index.number_of_replicas: 0
setup.kibana:
  host: "kibana.XXXXX.com:5601"
  username: XXXXX
  password: XXXXX
metricbeat.modules:
- module: elasticsearch
  metricsets:
- ccr
- enrich
- cluster_stats
- index
- index_recovery
- index_summary
- ml_job
- node_stats
- shard
  hosts: ["http://localhost:9200"]
  period: 180s
  username: XXXXX
  password: XXXXX
  xpack.enabled: true
  ilm.enabled: true
  ilm.rollover_alias: "metricbeat"

The ES warm cluster url is not at es-monitoring, that is just a single node running a backend cluster. These warm nodes are at es1-url for us. Metricbeat is only connected to these nodes as the process that's running on it, the data is all shipped elsewhere. On the node that's monitoring the master it has almost the same config, except hosts: is set directly to a list of master ips and period: 120s.

Any tips on metricbeat would be very appreciated.

Kaiyan_Sheng · October 5, 2020, 10:29pm

What metricbeat version are you using and also what OS are you running it on? Thank you!

system · November 3, 2020, 12:29am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Metricbeat CPU Visualization missing Beats metricbeat	14	6929	May 23, 2018
CPU usage 100% when Elasticsearch is down Beats	6	834	August 6, 2019
Metricbeat CPU Beats metricbeat	2	2081	July 28, 2016
High CPU usages in Elastic Node Kibana	3	861	August 5, 2021
Metricbeat produces CPU usage spikes every 10 seconds across different releases Beats docker , metricbeat	1	377	March 30, 2022

High CPU usage on warms caused by metricbeat

Related topics