Processing metric sets to reduce data amount

(Heikki Laaksonen) #1

Has anyone succeeded to filter out metricbeat events with metricbeat-5.0.0-rc1-x86_64.rpm? What I try to do is to get the memory and CPU load statistics only from few selected processes. How ever, I am not able to get the filters to work as expected.

I was not able to do this with 'processes' field where I tried syntax variations like processes: ['kafka.|zookeeper.']. I was also not sure to what process fields this maps. Then I tried to create inverted regexp match with drop_event but I was not able to get the basic filtering out of an event to work with example /1/.

Has anyone created a filter with rc1 that enables getting CPU and memory only from selected processes?

Is it possible to create inverted drop filter based regexp on on different keys like /2/? Some processes are running with specific user and some of them are running as root which requires using different fields to filter the data.

The optimization that I am trying to do is that I have only a single node Elasticsearch to store measurements but multiple nodes that produce the data. The amount of events gets pretty large with default configuration if there are start to get more than 10 nodes that produce the data.



  • module: system

    CPU stats

    • cpu

    System Load stats

    • load

    Per CPU core stats

    #- core

    IO stats

    • diskio

    Per filesystem stats

    • filesystem

    File system summary stats

    • fsstat

    Memory stats

    • memory

    Network stats

    • network

    Per process stats

    • process
      enabled: true
      period: 10s
      processes: ['.*']
    • drop_event:
      system.process.username: 'nerve'

- drop_event:
system.process.username: '^nerve|^synapse'
system.process.cmdline: '(^.synapse.)|(^.nerve.)'

Metricbeat system module filtering tips and tricks
(Monica Sarbu) #2

The processes configuration option accepts a list of regular expressions. I created a golang playgroud to help you construct the right regular expressions. You just need to add a function call Match in the main function to check that the string matches a certain regular expression. For example, I used the regular expression kafka.|zookeeper. to see if it matches the kafka process, and it fails:

Match("kafka", "kafka.|zookeeper.")

Please let me know if you have further questions.

(Heikki Laaksonen) #3

I was able to get 'processes: ['java']' to match only the Java processes but I cannot get anything else to work. I tried also basic filtering with 'processes: ['kafka']' syntax but that did not result process level metrics.

It may be that I have misunderstood how this is expected to work. I am assuming that this would for example match the user name running the process or the command line used to start the process /1/. I tried with 'processes: ['elastic']' and I think it matched to some zombie process or script that is named as 'elasticsearch_h' which PID is constantly changing /2/. This did not seem to match to actual Elasticsearch service that we are running.

This may be some dummy mistake from my end as well. If you can clarify to what Linux command output the 'processes: ' filtering matches and I could check how that is visible from our Linux (RHEL7.2) environment.


Tasks: 313 total, 1 running, 309 sleeping, 0 stopped, 3 zombie
%Cpu(s): 2.9 us, 0.8 sy, 0.0 ni, 95.9 id, 0.2 wa, 0.0 hi, 0.0 si, 0.1 st
KiB Mem : 20888560 total, 8518116 free, 6685824 used, 5684620 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 13187196 avail Mem

2728 kafka 20 0 12.378g 1.156g 15700 S 6.2 5.8 116:10.70 java
$ ps -aef | grep 2728
kafka 2728 1 3 Oct25 ? 01:56:14 java -Xms kafka.Kafka /etc/kafka/

"_index": "metricbeat-2016.10.27",
"_type": "metricsets",
"_id": "AVgF7YZtd5zAnjYIOEyt",
"_score": null,
"_source": {
"@timestamp": "2016-10-27T11:36:41.438Z",
"beat": {
"hostname": "esps0.local",
"name": "esps0.local"
"metricset": {
"module": "system",
"name": "process",
"rtt": 25662
"system": {
"process": {
"cpu": {
"start_time": "2016-10-27T11:36:37.000Z",
"total": {
"pct": 0
"fd": {
"limit": {
"hard": 4096,
"soft": 1024
"open": 0
"memory": {
"rss": {
"bytes": 0,
"pct": 0
"share": 0,
"size": 0
"name": "elasticsearch_h",
"pgid": 8800,
"pid": 8800,
"ppid": 1219,
"state": "zombie",
"username": "root"
"type": "metricsets"
"fields": {
"@timestamp": [
"system.process.cpu.start_time": [
"highlight": {
"": [
"sort": [

(system) #4

This topic was automatically closed after 21 days. New replies are no longer allowed.