Metricbeat system module filtering tips and tricks

Hello/FYI,

I had problems using the Metricbeat filters but I finally figured this out with the help of responses from andrewkroh and monica. Here are few highlights that may be helpful. The instructions and comments are for Metricbeat 5.1.1 op top of RHEL.

There were two things that I did not understand related to system module filters that caused problems:

  1. It is possible to write system module configuration multiple times and apply the filters individually for each metric set.

  2. The filter syntax 'drop_event.when.regexp.name' works but syntax that includes the whole json name path like 'drop_event.when.regexp.system.process.name' seems not to work.

Adding a filter is easier if you do it one metric set at the time and apply filter to Kibana to show only the currently processed filter. For example, apply first filters for the filesystem and then configure the diskio metric set.

# First step for metricbeat.yml
metricbeat.modules:
- module: system
  period: 10s
  metricsets: ["filesystem"]
  filters:
    - drop_event.when.regexp.mount_point: '^/(sys|cgroup|proc|dev|etc|host|run|var)($|/)'

# Apply a Kibana filter into the Discovery tab for the metricbeat index to see that disk filtering works.
> {"bool":{"must":{"match":{"metricset.name":"filesystem"}}}}

# Second step for metricbeat.yml
metricbeat.modules:
- module: system
  period: 10s
  metricsets: ["filesystem"]
  filters:
    - drop_event.when.regexp.mount_point: '^/(sys|cgroup|proc|dev|etc|host|run|var)($|/)'

- module: system
  period: 10s
  metricsets: ["diskio"]
  filters:
    - drop_event.when.regexp.name: '^sr0$'

# Apply a Kibana filter into the Discovery tab for the metricbeat index to see that diskio filtering works.
> {"bool":{"must":{"match":{"metricset.name":"diskio"}}}}

There are few issues that you should be aware of (please comment if there is a solution or misunderstanding):

  1. At least with version 5.1.1, it seems not to be possible to select the events that are not dropped. This generates long regexp filter to drop unnecessary services for process metric set. The problem seems to be that the Golang regexp do not seem to support lookahead matches that would allow to write regexp that would behave like "drop_event all events expect == 'kafka|zookeeper|spark'". And there is no filter processor to 'match_event'.

  2. Filtering out the unnecessary processes for process metric set is complicated. The problem is that the process name can map only to 'java' or 'python' with multiple processes. Unless you are able to filter based on service user so that all services are not running as root or from service startup command, filtering the services is difficult for the process metric set. I do not know if it possible to write filter based on two fields for the same metric set.

  3. In order to scale the process metric set to multiple hosts, the filtering is most likely needed. By default, there is roughly 1000 events in 30 second window from a single host without filters. This does not include code specific CPU metrics.

  4. At least with latest OpenStacks releases, the volume device file systems are allocated randomly making e.g. the vdb or vdc to be randomly allocated to a specific mount point name. The diskio metric set does not have the mount point name that would be always constant allowing queries that would always work.

One example for system module configuration (that contains truncated process filter to mask out some services)

metricbeat.modules:

  - module: system
    period: 10s
    metricsets: ["cpu"]

  - module: system
    period: 10s
    metricsets: ["load"]

  - module: system
    period: 10s
    metricsets: ["memory"]

  - module: system
    period: 10s
    metricsets: ["network"]

  - module: system
    period: 10s
    metricsets: ["filesystem"]
    filters:
      - drop_event.when.regexp.mount_point: '^/(sys|cgroup|proc|dev|etc|host|run|var)($|/)'

  - module: system
    period: 10s
    metricsets: ["fsstat"]

  - module: system
    period: 10s
    metricsets: ["diskio"]
    filters:
      - drop_event.when.regexp.name: '^sr0$'

  - module: system
    period: 10s
    metricsets: ["process"]
    filters:
      - drop_event.when.regexp.name: '^.*(kworker|ksoftirqd|rcu|watchdog|migration|kthread|rcu_sched|systemd|agetty|auditd|sshd|bash|ksmd|lvmetad|scsi|khungtaskd|jbd2|kblockd|bioset|dbus|khelper|kmpath|kintegrityd|khugepaged|fsnotify|ata_sff|LCPDEV|perf|crond|irqbalance|kdevtmpfs|writeback|deferwq|kswapd|kthrotld|kpsmoused|tail|vballoon|polkitd|metricbeat|ttm_swap|syslog|udp_rcv).*'
4 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.