Hello!
I'm trying to create a Watcher Alert that will be triggered when some process on a node uses over 0.95% of CPU for the last one hour.
Here is an example of my config:
{
"trigger": {
"schedule": {
"interval": "10m"
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"metricbeat*"
],
"types": [],
"body": {
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"system.process.cpu.total.norm.pct": {
"gte": 0.95
}
}
},
{
"range": {
"system.process.cpu.start_time": {
"gte": "now-1h"
}
}
},
{
"match": {
"environment": "test"
}
}
]
}
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total": {
"gt": 0
}
}
},
"actions": {
"send-to-slack": {
"throttle_period_in_millis": 1800000,
"webhook": {
"scheme": "https",
"host": "hooks.slack.com",
"port": 443,
"method": "post",
"path": "{{ctx.metadata.onovozhylov-test}}",
"params": {},
"headers": {
"Content-Type": "application/json"
},
"body": "{ \"text\": \" ==========\nTest parameters:\n\tthrottle_period_in_millis: 60000\n\tInterval: 1m\n\tcpu.total.norm.pct: 0.5\n\tcpu.start_time: now-1m\n\nThe watcher:*{{ctx.watch_id}}* in env:*{{ctx.metadata.env}}* found that the process *{{ctx.system.process.name}}* has been utilizing CPU over 95% for the past 1 hr on node:\n{{#ctx.payload.nodes}}\t{{.}}\n\n{{/ctx.payload.nodes}}\n\nThe runbook entry is here: *{{ctx.metadata.runbook}}* \"}"
}
}
},
"metadata": {
"onovozhylov-test": "/services/T0U0CFMT4/BBK1A2AAH/MlHAF2QuPjGZV95dvO11111111",
"env": "{{ grains.get('environment') }}",
"runbook": "http://mytest.com"
}
}
This Watcher doesn't work when I set the metric system.process.cpu.start_time
. Perhaps this metric is not a correct one...
And another issue is that I don't know how to add the system.process.name
into a message body.
Thanks in advance for any help!