Metricbeat on FreeBSD using a lot of resources

Hi,

I am running Metricbeat on FreeBSD.

❯ ./metricbeat -version [3:42:34 PM]
metricbeat version 5.0.0-alpha4 (amd64), libbeat 5.0.0-alpha4

It seems to use a whole lot of resources for a simple monitoring tool:

  PID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
12583 root            17  21    0 99776K 22784K uwait   6  40:20  41.26% metricbeat
  785 root             1  52    0 61204K  6696K select  6   5:05   0.00% sshd
32240 root             1  20    0 25448K  4960K select  6   4:21   0.00% ntpd
27606 root             3  40    0   736M   714M uwait   1   3:33   0.00% redis-serv
27608 root             3  20    0   696M   672M uwait   5   3:17   0.00% redis-serv

Here's my conf:

❯ grep -v '^#' metricbeat.yml [3:44:15 PM]

    metricbeat.modules:
    - module: system
      metricsets:
        - cpu
        #- core
        - diskio
        - filesystem
        #- fsstat
        - memory
        - network
        - process
      enabled: true
      period: 10s
      processes: ['.*']

      # if true, exports the CPU usage in ticks, together with the percentage values
      cpu_ticks: false

    - module: apache
      metricsets: ["status"]
      enabled: true
      period: 10s
      hosts: ["http://172.16.0.70"]

    - module: redis
      metricsets: ["info"]
      enabled: true
      period: 10s
      hosts: ["172.16.0.70:7000"]



    output.elasticsearch:
      hosts: ["172.16.0.0:9200"]
      template.name: "metricbeat"
      template.path: "metricbeat.template-es2x.json"
      template.overwrite: false

Any ideas?

Could you remove you filter for processes, as it seems to match all processes anyway? Comment out or remove the line processes: ['.*']. It could be that the regexp causes the CPU load.

Will give it a try, and report back

Still the same issue.
I started the metricbeat process (with the new config) 4.39pm

Please see attached graphs of the spike in load.

Any ideas?

Thanks,

Mike

Could you try the nightly instead of alpha4? https://beats-nightlies.s3.amazonaws.com/index.html?prefix=metricbeat/ it should not make a change but we recently had some changes in the Freebsd support in gosigar. Be aware that "official" FreeBSD is not supported yet, but we are trying to get it working: https://github.com/elastic/beats/issues?utf8=✓&q=freebsd%20is%3Aissue%20is%3Aopen%20

Could you try to enable one metricset after the other to see which one causes the issue?

@andrewkroh Perhaps you know more here?

I will give it a go, and report back.
Is there a special branch to checkout from github for the nightly build ? (master?)

It seems to be the "process" metricset.
When removing it from the configuration, CPU usage seems to be much more reasonable:

  PID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
49691 root            10  20    0 55840K 18328K uwait   5   0:01   0.29% metricbeat
  785 root             1  21    0 61204K  6696K select  6   5:06   0.00% sshd
32240 root             1  20    0 25448K  4960K select  6   4:25   0.00% ntpd
27606 root             3  40    0   736M   714M uwait   1   4:15   0.00% redis-serv
27608 root             3  20    0   696M   672M uwait   5   3:58   0.00% redis-serv
  426 unbound          1  20    0 45296K 21168K select  1   2:15   0.00% unbound 

Thanks,

There was another report of high CPU usage here. Maybe they are related.

Hmm.. I see the CPU usage spike straight away.

Yes, best is to build your binaries off master. Sorry about the link to the nightlies as these are not available for freebsd yet.

Right. I am already running latest build

Ok, that is great.

Just as a quick summary. The minimal config that causes the issue on FreeBSD is:

metricbeat.modules:
- module: system
  metricsets:
    - process
  period: 10s

And the spike happens immediately after starting metricbeat. Correct?

How many processes you you have?

Hi,

Yes. Here is my current config:

❯ grep -v '^#' metricbeat.yml | cat -s [10:23:23 AM]

metricbeat.modules:
- module: system
  metricsets:
    - process
  enabled: true
  period: 10s

  # if true, exports the CPU usage in ticks, together with the percentage values

output.elasticsearch:
  hosts: ["172.16.0.140:9200"]
  template.name: "metricbeat"
  template.path: "metricbeat.template-es2x.json"
  template.overwrite: false

The spike happens immediately after launching the process (and remains).

 ❯ ps -ax | wc -l                                                        [10:26:27 AM]
      81

Could you try to follow the debugging recommended by @andrewkroh here? Metricbeat CPU

Please find the pprof file here:

https://dl.dropboxusercontent.com/u/42812/Temp/pprof.localhost%3A6060.samples.cpu.001.pb.gz

@michbsd Thanks for sharing. We will have a look to see if it helps in the investigations.

Great thanks.

Let me know if I can assist in any way

Mike

@michbsd There is not real progress here yet. There wasn't something very obvious yet in the data you provided. We once had a similar issue on OS X but couldn't reproduce it. The problem is that I currently don't have a FreeBSD machine at hand to test it. My assumption currently is that it is probably more a gosigar issue then metricbeat itself: https://github.com/elastic/gosigar There were other FreeBSD challenges in the past.

Hi @ruflin

Thanks for the feedback.
Do you suggest I open a ticket with gosigar?

thanks,

@michbsd yeah, probably makes sense to open a issue in gosigar: https://github.com/elastic/gosigar This also brings visibility to the other FreeBSD users. Make sure to link this discussion.