I have been using topbeat to collect and monitor disk usage data. I have made a barchart in kibana for the top 5 servers with most disk usage percentage on fs.used_p.
How can I use watcher to alert when the percentage reach a certain number?
Hey,
thats not too hard. The important part is to get the query right - and this depends what you want to use. I took this as an example:
GET topbeat-*/filesystem/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"@timestamp": {
"gte": "now-1m",
"lte": "now"
}
}
},
{
"range" : {
"fs.used_p" : {
"gte" : 0.4
}
}
}
]
}
},
"aggs": {
"by_host": {
"terms": {
"field": "beat.hostname",
"size": 10
},
"aggs": {
"by_fs": {
"terms": {
"field": "fs.device_name",
"size": 10
},
"aggs": {
"max": {
"max": {
"field": "fs.used_p"
}
}
}
}
}
}
}
}
which returns data like this
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 10,
"max_score": 0,
"hits": []
},
"aggregations": {
"by_host": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "orca",
"doc_count": 10,
"by_fs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "/dev/disk1",
"doc_count": 5,
"max": {
"value": 0.49000000953674316
}
},
{
"key": "devfs",
"doc_count": 5,
"max": {
"value": 1
}
}
]
}
}
]
}
}
}
As you can see I searched for data on my notebook with disks that are more than 40% full, aggregate by host and disk and return the max value of the disk usage.
You can easily combine this into a fully fledged watch
PUT _watcher/watch/free_space
{
"metadata": {
"used_percent": 0.4
},
"trigger": {
"schedule": {
"interval": "5m"
}
},
"input": {
"search": {
"request": {
"indices": [
"<topbeat-{now/d}>"
],
"types": [
"filesystem"
],
"body": {
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"@timestamp": {
"gte": "now-1m",
"lte": "now"
}
}
},
{
"range": {
"fs.used_p": {
"gte": 0.4
}
}
}
]
}
},
"aggs": {
"by_host": {
"terms": {
"field": "beat.hostname",
"size": 100
},
"aggs": {
"by_fs": {
"terms": {
"field": "fs.device_name"
},
"aggs": {
"max": {
"max": {
"field": "fs.used_p"
}
}
}
}
}
}
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total": {
"gte": 5
}
}
},
"actions": {
"email_alert": {
"throttle_period": "15m",
"email": {
"to": "user@example.org",
"subject": "Watcher {{ctx.watch_id}} at {{ctx.trigger.triggered_time}}: Filesystem Usage Alert for some hosts",
"body": "{{#ctx.payload.aggregations.by_host.buckets}}Host {{key}} with {{#by_fs.buckets}}[{{key}}/{{max.value}}]{{/by_fs.buckets}}\n{{/ctx.payload.aggregations.by_host.buckets}}"
}
}
}
}
If you check out the email you are receiving you will see that it is pretty raw and you should invest some time, to maybe do a transform to create better values to output.
But this is the basic idea...
Hope this helps...
--Alex