Sensible increase of load average since using metricbeat with docker

Hi there,

As soon as I deployed metricbeat using the following rules (https://logz.io/blog/docker-metricbeat/), load average started to raise significantly.

I only have on my server (4 CPU, 4Go RAM) :

  • 37 containers
  • filebeat (collecting docker data)
  • metricbeat (collecting docker data)
  • telegraf (not collecting docker data)

As you can notice, you can clearly see when I turn on and off metricbeat.

Let me know what kind of details you need to investigate further.

Nicolas

Can you check what is taking the CPU? Is it Metricbeat or the Docker daemon or something else? Please also post your Metricbeat version and configuration file. If you used the one from the logz.io blog post, it only polls every 10s, so the increase in load is strange.

Hi,

It was mostly dockerd if i remember well in alternance with metricbeat.

All the stack is 6.0.1

###################### Metricbeat Configuration Example #######################

# This file is an example configuration file highlighting only the most common
# options. The metricbeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/metricbeat/index.html

#==========================  Modules configuration ============================

metricbeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

metricbeat.modules:
- module: docker
  metricsets: ["container", "cpu", "diskio", "healthcheck", "info", "memory", "network"]
  hosts: ["unix:///var/run/docker.sock"]
  period: 10s


#==================== Elasticsearch template setting ==========================

setup.template.settings:
  index.number_of_shards: 1
  index.codec: best_compression
  #_source.enabled: false

#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging


#============================== Dashboards =====================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here, or by using the `-setup` CLI flag or the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

#============================== Kibana =====================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"
  host: "https://XXXXXX:443"

#============================= Elastic Cloud ==================================

# These settings simplify using metricbeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

#================================ Outputs =====================================

# Configure what output to use when sending the data collected by the beat.

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["XXXXXX:443"]

  # Optional protocol and basic auth credentials.
  protocol: "https"
  username: "XXXXXX"
  password: "XXXXXX"

#----------------------------- Logstash output --------------------------------
#output.logstash:
  # The Logstash hosts
  #hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

#================================ Logging =====================================

# Sets log level. The default log level is info.
# Available log levels are: critical, error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]

Hi @nsteinmetz,

I'm wondering, does the performance improve when setting a higher period? Like period: 30s. I'm suspecting docker daemon getting loaded by Metricbeat requests.

Hi @exekias,

Indeed, 30s seems almost without impacts - like only +0.1 on Load Average indicator.

Just applied this setting, I will follow and confirm over the coming hours.

Thanks,
Nicolas

Hmm 30s may be too high too but at least I don't have CPU alerts yet

Screenshot-2017-12-11 Grafana - crnt-cvsq

Just switched to 60s to see if it's acceptable.

I'm wondering why Docker is being so slow to Metricbeat requests, could you please check the number of containers you have (both running and stopped)? You can do that by running: docker ps -a | wc -l

Best regards

hi,

I have 37 running containers with low traffic and none stopped.

Thanks

I've been doing some tests and couldn't reproduce this load. What version of Docker are you using?

docker 17.09

root@cvsq-vc1m:~# docker info
Containers: 37
 Running: 37
 Paused: 0
 Stopped: 0
Images: 150
Server Version: 17.09.0-ce
Storage Driver: devicemapper
 Pool Name: docker-253:0-2230358-pool
 Pool Blocksize: 65.54kB
 Base Device Size: 10.74GB
 Backing Filesystem: ext4
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 7.18GB
 Data Space Total: 107.4GB
 Data Space Available: 35.33GB
 Metadata Space Used: 12.62MB
 Metadata Space Total: 2.147GB
 Metadata Space Available: 2.135GB
 Thin Pool Minimum Free Space: 10.74GB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.90 (2014-09-01)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Kernel Version: 4.10.8-std-1
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 3.857GiB
Name: cvsq-vc1m
ID: 2CGM:O6DT:GY7E:BKDI:5OPT:MZRF:F5BX:MNKO:RONJ:EBDB:HWVC:QAOC
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: devicemapper: usage of loopback devices is strongly discouraged for pro                                                                                                                     duction use.
         Use `--storage-opt dm.thinpooldev` to specify a custom block storage de                                                                                                                     vice.
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support

Maybe I should swtich to docker initscript from scaleway first and see if it's better. Just noticed I'm on 4.10.8-std-1

Let's see what happens with:

root@cvsq-vc1m # docker info
Containers: 37
 Running: 37
 Paused: 0
 Stopped: 0
Images: 150
Server Version: 17.09.0-ce
Storage Driver: devicemapper
 Pool Name: docker-253:0-2230358-pool
 Pool Blocksize: 65.54kB
 Base Device Size: 10.74GB
 Backing Filesystem: ext4
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 7.184GB
 Data Space Total: 107.4GB
 Data Space Available: 34.82GB
 Metadata Space Used: 12.63MB
 Metadata Space Total: 2.147GB
 Metadata Space Available: 2.135GB
 Thin Pool Minimum Free Space: 10.74GB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.90 (2014-09-01)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Kernel Version: 4.10.8-docker-1
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 3.857GiB
Name: cvsq-vc1m
ID: 2CGM:O6DT:GY7E:BKDI:5OPT:MZRF:F5BX:MNKO:RONJ:EBDB:HWVC:QAOC
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: devicemapper: usage of loopback devices is strongly discouraged for production use.
         Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.