Metricbeat - high network utilisation

Hello,

We would like to use Metricbeat to monitor systems stats for a number of servers across slow WAN links. If I do a default install of Metric beat my outbound network throughput is around 7KB/s, however I want to see normalized CPU percentages so I update the metricbeat.modules section of my yml file according to this. It then gives me the stats I want but network util shoots up to nearly 70KB/s.

I have tried increasing the "period"in metricbeat.modules and tried commenting out all but the default metrics. It seems that as soon as I enable metricbeat system metrics using the YML file, network thoughput shoots up.

Any ideas on why it behaves so differently when using the yml file to enable system metrics?

Could you please share your complete metricbeat.yml file, the one that is causing the outbound network throughput to shoot up? Also, what version of Metricbeat are you running?

Thanks.

Here it is. I have tried commenting out most of the lines under

metricbeat.modules:


#==========================  Modules configuration ============================

metricbeat.config.modules:
  #Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml


  reload.enabled: false


#==================== Elasticsearch template setting ==========================

setup.template.settings:
  index.number_of_shards: 1
  index.codec: best_compression
  #_source.enabled: false

#================================ General =====================================


#============================== Dashboards =====================================

#============================== Kibana =====================================

#Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
#This requires a Kibana endpoint configuration.
setup.kibana:

  #Kibana Host
  #Scheme and port can be left out and will be set to the default (http and 5601)
  #In case you specify and additional path, the scheme is required: http://localhost:5601/path
   #IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  host: "https://****************:5601"
  #Optional protocol and basic auth credentials.
  protocol: "https"
  username: "elastic"
  password: "**********"

  ssl.enabled: true
  
  ssl.verification_mode: none
  #Kibana Space ID
  #ID of the Kibana Space into which the dashboards should be loaded. By default,
  #the Default Space will be used.
  #space.id:



#================================ Outputs =====================================

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  #Array of hosts to connect to.
  hosts: ["*************:9200"]

  #Optional protocol and basic auth credentials.
  protocol: "https"
  username: "elastic"
  password: "***********"
  
  ssl.verification_mode: none

#----------------------------- Logstash output --------------------------------
#output.logstash:
  #The Logstash hosts
  #hosts: ["localhost:5044"]

  #Optional SSL. By default is off.
  #List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  #Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  #Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

#================================ Processors =====================================

#Configure processors to enhance or manipulate events generated by the beat.

processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

#================================ Logging =====================================

#Sets log level. The default log level is info.
#Available log levels are: error, warning, info, debug
#logging.level: debug

#At debug level, you can selectively enable logging only for some components.
#To enable all selectors use ["*"]. Examples of other selectors are "beat",
#"publish", "service".
#logging.selectors: ["*"]

#============================== Xpack Monitoring ===============================
#metricbeat can export internal metrics to a central Elasticsearch monitoring
#cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
#reporting is disabled by default.

#Set to true to enable the monitoring reporter.
#xpack.monitoring.enabled: false

#Uncomment to send the metrics to Elasticsearch. Most settings from the
#Elasticsearch output are accepted here as well. Any setting that is not set is
#automatically inherited from the Elasticsearch output configuration, so if you
#have the Elasticsearch output configured, you can simply uncomment the
#following line.
#xpack.monitoring.elasticsearch:

#================================= Migration ==================================

#This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true

metricbeat.modules:
- module: system
  metricsets:
    - cpu             # CPU usage
    - load            # CPU load averages
    - memory          # Memory usage
    - network         # Network IO
    - process         # Per process metrics
    - process_summary # Process summary
    - uptime          # System Uptime
    - socket_summary  # Socket summary
    - core           # Per CPU core usage
    - diskio         # Disk IO
    - filesystem     # File system usage for each mountpoint
    - fsstat         # File system summary metrics
    #- raid           # Raid
    - socket         # Sockets and connection info (linux only)
  enabled: true
  period: 30s
  processes: ['.*']

  #elasticsearch onfigure the metric types that are included by these metricsets.
  cpu.metrics:  ["normalized_percentages"]  # The other available options are normalized_percentages and ticks.
  core.metrics: ["percentages"]  # The other available option is ticks.

  #elasticsearch  list of filesystem types to ignore. The filesystem metricset will not
  #elasticsearch ollect data from filesystems matching any of the specified types, and
  #elasticsearch sstats will not include data from these filesystems in its summary stats.
  #elasticsearch f not set, types associated to virtual filesystems are automatically
  #elasticsearch dded when this information is available in the system (e.g. the list of
  #elasticsearch nodev` types in `/proc/filesystem`).
  #filesystem.ignore_types: []

  # These options allow you to filter out all processes that are not
  # in the top N by CPU or memory, in order to reduce the number of documents created.
  #elasticsearch
#i f both the `by_cpu` and `by_memory` options are used, the union of the two sets
  #elasticsearch s included.
  #process.include_top_n:

    # Set to false to disable this feature and include all processes
    #enabled: true

    # How many processes to include from the top by CPU. The processes are sorted
    # by the `system.process.cpu.total.pct` field.
    #by_cpu: 0

    # How many processes to include from the top by memory. The processes are sorted
    # by the `system.process.memory.rss.bytes` field.
    #by_memory: 0

  # If false, cmdline of a process is not cached.
  #process.cmdline.cache.enabled: true

  # Enable collection of cgroup metrics from processes on Linux.
  #process.cgroups.enabled: true

  # A list of regular expressions used to whitelist environment variables
  # reported with the process metricset's events. Defaults to empty.
  #process.env.whitelist: []

  # Include the cumulative CPU tick values with the process metrics. Defaults
  # to false.
  #process.include_cpu_ticks: false

  # Raid mount point to monitor
  #raid.mount_point: '/'

  # Configure reverse DNS lookup on remote IP addresses in the socket metricset.
  #socket.reverse_lookup.enabled: false
  #socket.reverse_lookup.success_ttl: 60s
  #socket.reverse_lookup.failure_ttl: 60s

  # Diskio configurations
  #diskio.include_devices: []

I forgot to ask earlier: what version of Metricbeat are you running?

Thanks for sharing your configuration. Looks like in addition to configuring cpu.metrics and core.metrics in your system module configuration you also enabled a few other metricsets that are not enabled by default, viz. diskio, filesystem, fsstat, and socket. Could you try commenting these out to see it that reduces the outbound network throughput? It might help us narrow down the culprit.

Thanks,

Shaunak

Hi,

It is version 7.4

I have done some testing on a dev server, tunring metric on and off. It looks like the "process" metric is the cause of high network usage. It also looks like some other metrics rely on this one since with it disabled the agent will not start if I enable "process summary" or "uptime".

Right now I have it working with Disk I/O and Normalized_percentage CPU which were may main requirements and network throughput is not too bad. Will be moving to a production server to test a little later today.

Thanks

  • module: system
    metricsets:
    • cpu # CPU usage
    • load # CPU load averages
    • memory # Memory usage
    • network # Network IO

    - process # Per process metrics

    - process_summary # Process summary

    - uptime # System Uptime

    - socket_summary # Socket summary

    #- core # Per CPU core usage
    • diskio # Disk IO
    • filesystem # File system usage for each mountpoint
      #- fsstat # File system summary metrics
      #- raid # Raid
      #- socket # Sockets and connection info (linux only)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.