Processing metric sets to reduce data amount

heilaaks · October 19, 2016, 10:38am

Has anyone succeeded to filter out metricbeat events with metricbeat-5.0.0-rc1-x86_64.rpm? What I try to do is to get the memory and CPU load statistics only from few selected processes. How ever, I am not able to get the filters to work as expected.

I was not able to do this with 'processes' field where I tried syntax variations like processes: ['kafka.|zookeeper.']. I was also not sure to what process fields this maps. Then I tried to create inverted regexp match with drop_event but I was not able to get the basic filtering out of an event to work with example /1/.

Has anyone created a filter with rc1 that enables getting CPU and memory only from selected processes?

Is it possible to create inverted drop filter based regexp on on different keys like /2/? Some processes are running with specific user and some of them are running as root which requires using different fields to filter the data.

The optimization that I am trying to do is that I have only a single node Elasticsearch to store measurements but multiple nodes that produce the data. The amount of events gets pretty large with default configuration if there are start to get more than 10 nodes that produce the data.

Thanks.

/1/

module: system
metricsets:

CPU stats
- cpu
System Load stats
- load
Per CPU core stats

#- core

IO stats
- diskio
Per filesystem stats
- filesystem
File system summary stats
- fsstat
Memory stats
- memory
Network stats
- network
Per process stats
- process
  enabled: true
  period: 10s
  processes: ['.*']
  processors:
- drop_event:
  when:
  regexp:
  system.process.username: 'nerve'

/2/
processors:
- drop_event:
when:
regexp:
system.process.username: '^nerve|^synapse'
system.process.cmdline: '(^.synapse.)|(^.nerve.)'

monica · October 19, 2016, 12:42pm

The processes configuration option accepts a list of regular expressions. I created a golang playgroud to help you construct the right regular expressions. You just need to add a function call Match in the main function to check that the string matches a certain regular expression. For example, I used the regular expression kafka.|zookeeper. to see if it matches the kafka process, and it fails:

Match("kafka", "kafka.|zookeeper.")

Please let me know if you have further questions.

heilaaks · October 27, 2016, 11:44am

I was able to get 'processes: ['java']' to match only the Java processes but I cannot get anything else to work. I tried also basic filtering with 'processes: ['kafka']' syntax but that did not result process level metrics.

It may be that I have misunderstood how this is expected to work. I am assuming that this would for example match the user name running the process or the command line used to start the process /1/. I tried with 'processes: ['elastic']' and I think it matched to some zombie process or script that is named as 'elasticsearch_h' which PID is constantly changing /2/. This did not seem to match to actual Elasticsearch service that we are running.

This may be some dummy mistake from my end as well. If you can clarify to what Linux command output the 'processes: ' filtering matches and I could check how that is visible from our Linux (RHEL7.2) environment.

Thanks

/1/
Tasks: 313 total, 1 running, 309 sleeping, 0 stopped, 3 zombie
%Cpu(s): 2.9 us, 0.8 sy, 0.0 ni, 95.9 id, 0.2 wa, 0.0 hi, 0.0 si, 0.1 st
KiB Mem : 20888560 total, 8518116 free, 6685824 used, 5684620 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 13187196 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2728 kafka 20 0 12.378g 1.156g 15700 S 6.2 5.8 116:10.70 java
...
$ ps -aef | grep 2728
kafka 2728 1 3 Oct25 ? 01:56:14 java -Xms kafka.Kafka /etc/kafka/server.properties

/2/
{
"_index": "metricbeat-2016.10.27",
"_type": "metricsets",
"_id": "AVgF7YZtd5zAnjYIOEyt",
"_score": null,
"_source": {
"@timestamp": "2016-10-27T11:36:41.438Z",
"beat": {
"hostname": "esps0.local",
"name": "esps0.local"
},
"metricset": {
"module": "system",
"name": "process",
"rtt": 25662
},
"system": {
"process": {
"cpu": {
"start_time": "2016-10-27T11:36:37.000Z",
"total": {
"pct": 0
}
},
"fd": {
"limit": {
"hard": 4096,
"soft": 1024
},
"open": 0
},
"memory": {
"rss": {
"bytes": 0,
"pct": 0
},
"share": 0,
"size": 0
},
"name": "elasticsearch_h",
"pgid": 8800,
"pid": 8800,
"ppid": 1219,
"state": "zombie",
"username": "root"
}
},
"type": "metricsets"
},
"fields": {
"@timestamp": [
1477568201438
],
"system.process.cpu.start_time": [
1477568197000
]
},
"highlight": {
"system.process.name": [
"@kibana-highlighted-field@elasticsearch_h@/kibana-highlighted-field@"
]
},
"sort": [
1477568201438
]
}

system · November 9, 2016, 10:38am

This topic was automatically closed after 21 days. New replies are no longer allowed.

Topic		Replies	Views
Drop_event config issue Beats metricbeat	2	848	April 6, 2017
Dropping fields doesnt work Beats metricbeat	5	1396	August 22, 2017
Metricbeat process, what regex filter to exclude processes Beats metricbeat	2	666	March 15, 2020
System.fsstat filtering / processors Beats metricbeat	4	1332	January 17, 2017
Metricbeat process regexp Beats metricbeat	5	721	May 16, 2018

Processing metric sets to reduce data amount

CPU stats

System Load stats

Per CPU core stats

IO stats

Per filesystem stats

File system summary stats

Memory stats

Network stats

Per process stats

Related topics