Metricbeat CPU

I've been running the daily snapshot 5.0.0a5 for the past 24 hours. I've started to see the metricbeat.exe use a lot more cpu than I would like.

I've been collecting the following:

- module: system
  metricsets:
     -cpu
     -diskio
     -process
 period: 20s
 processes: ['filebeat.exe','metricbeat.exe']

cpu_ticks: false

- module: system
   metricsets:
 - filesystem  
enabled: true
period: 300s 

- module: system
  metricsets:
   - memory
  enabled: true
  period: 60s 

 processors:
  - drop_fields:
     fields: ["system.process.cmdline","system.process.cpu.start_time","metricset.rtt"]

Output to Elasticsearch

This is the usage of metricbeat.exe

Looking at the log file created for the time of the three larger spikes I have the following:

 2016-07-07T12:04:24-04:00 INFO Non-zero metrics in the last 30s: libbeat.publisher.published_events=2 fetches.system-cpu.success=2 libbeat.es.publish.write_bytes=1269 libbeat.es.published_and_acked_events=2 libbeat.es.call_count.PublishEvents=2 libbeat.es.publish.read_bytes=596 libbeat.publisher.messages_in_worker_queues=2 fetches.system-cpu.events=2
 2016-07-07T12:04:54-04:00 INFO Non-zero metrics in the last 30s: fetches.system-process.success=3 libbeat.es.publish.write_bytes=44549 libbeat.publisher.published_events=95 fetches.system-diskio.success=3 fetches.system-memory.success=1 fetches.system-diskio.events=3 libbeat.es.call_count.PublishEvents=7 fetches.system-cpu.events=1 fetches.system-cpu.success=1 fetches.system-process.events=90 libbeat.es.publish.read_bytes=17400 libbeat.publisher.messages_in_worker_queues=95 fetches.system-memory.events=1 libbeat.es.published_and_acked_events=95

 2016-07-07T12:21:24-04:00 INFO Non-zero metrics in the last 30s: libbeat.publisher.messages_in_worker_queues=34 libbeat.es.publish.write_bytes=16129 fetches.system-cpu.success=2 libbeat.es.published_and_acked_events=34 libbeat.es.call_count.PublishEvents=4 libbeat.es.publish.read_bytes=6411 fetches.system-process.success=1 fetches.system-diskio.events=2 fetches.system-diskio.success=2 fetches.system-cpu.events=2 fetches.system-process.events=30 libbeat.publisher.published_events=34
 2016-07-07T12:21:54-04:00 INFO Non-zero metrics in the last 30s: libbeat.publisher.published_events=63 fetches.system-process.success=2 libbeat.es.publish.read_bytes=11460 libbeat.es.publish.write_bytes=29424 fetches.system-cpu.success=1 fetches.system-cpu.events=1 fetches.system-diskio.success=2 libbeat.es.call_count.PublishEvents=4 fetches.system-diskio.events=2 fetches.system-memory.success=1 fetches.system-process.events=59 libbeat.publisher.messages_in_worker_queues=63 fetches.system-memory.events=1 libbeat.es.published_and_acked_events=63

 2016-07-07T12:27:24-04:00 INFO Non-zero metrics in the last 30s: libbeat.es.publish.write_bytes=15771 libbeat.publisher.published_events=33 fetches.system-cpu.success=2 fetches.system-process.success=1 libbeat.es.publish.read_bytes=6239 fetches.system-diskio.events=1 fetches.system-process.events=30 libbeat.es.call_count.PublishEvents=4 fetches.system-cpu.events=2 fetches.system-diskio.success=1 libbeat.es.published_and_acked_events=33 libbeat.publisher.messages_in_worker_queues=33
 2016-07-07T12:27:54-04:00 INFO Non-zero metrics in the last 30s: fetches.system-memory.success=1 libbeat.es.published_and_acked_events=63 fetches.system-cpu.events=1 libbeat.es.publish.write_bytes=29537 fetches.system-diskio.success=1 fetches.system-process.events=60 fetches.system-diskio.events=1 fetches.system-memory.events=1 fetches.system-process.success=2 libbeat.publisher.messages_in_worker_queues=63 libbeat.publisher.published_events=63 fetches.system-cpu.success=1 libbeat.es.publish.read_bytes=11461 libbeat.es.call_count.PublishEvents=4

Is there anything else I can check to see why the CPU is increasing?

Thanks

I haven't seen this on my machine. But I'm running the alpha4 release on Linux. I wonder if it has something to do with differences on Windows. I'll install the snapshot release and see if I get similar results on Linux.

You could connect to the live process with a profiler or you could configure the process to dump CPU profiling information to a file. I prefer the live profiling. You must add a CLI flag when starting the Beat to expose the HTTP endpoint -httpprof "localhost:6060". Then you can connect with the pprof tool go tool pprof http://localhost:6060/debug/pprof/profile. It might be difficult to find the cause. This is a good howto: https://blog.golang.org/profiling-go-programs

Another approach might be to set the period lower to accelerate the problem and then try each metricset individually (or try removing a single metricset) to see if you can isolate the issue to one metricset. Just trying to throw out some ideas :slight_smile:

From past experience, my first suspect would be system/process metricset since you are on Windows.

1 Like

This topic was automatically closed after 21 days. New replies are no longer allowed.