Use watcher with topbeat

alerting

(Kennedy Kan) #1

I have been using topbeat to collect and monitor disk usage data. I have made a barchart in kibana for the top 5 servers with most disk usage percentage on fs.used_p.
How can I use watcher to alert when the percentage reach a certain number?


(Alexander Reelsen) #2

Hey,

thats not too hard. The important part is to get the query right - and this depends what you want to use. I took this as an example:

GET topbeat-*/filesystem/_search
{
  "size": 0, 
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "@timestamp": {
              "gte": "now-1m",
              "lte": "now"
            }
          }
        },
        {
          "range" : {
            "fs.used_p" : {
              "gte" : 0.4
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "by_host": {
      "terms": {
        "field": "beat.hostname",
        "size": 10
      },
      "aggs": {
        "by_fs": {
          "terms": {
            "field": "fs.device_name",
            "size": 10
          },
          "aggs": {
            "max": {
              "max": {
                "field": "fs.used_p"
              }
            }
          }
        }
      }
    }
  }
}

which returns data like this

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 10,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "by_host": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "orca",
          "doc_count": 10,
          "by_fs": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "/dev/disk1",
                "doc_count": 5,
                "max": {
                  "value": 0.49000000953674316
                }
              },
              {
                "key": "devfs",
                "doc_count": 5,
                "max": {
                  "value": 1
                }
              }
            ]
          }
        }
      ]
    }
  }
}

As you can see I searched for data on my notebook with disks that are more than 40% full, aggregate by host and disk and return the max value of the disk usage.

You can easily combine this into a fully fledged watch

PUT _watcher/watch/free_space
{
  "metadata": {
    "used_percent": 0.4
  },
  "trigger": {
    "schedule": {
      "interval": "5m"
    }
  },
  "input": {
    "search": {
      "request": {
        "indices": [
          "<topbeat-{now/d}>"
        ],
        "types": [
          "filesystem"
        ],
        "body": {
          "size": 0,
          "query": {
            "bool": {
              "filter": [
                {
                  "range": {
                    "@timestamp": {
                      "gte": "now-1m",
                      "lte": "now"
                    }
                  }
                },
                {
                  "range": {
                    "fs.used_p": {
                      "gte": 0.4
                    }
                  }
                }
              ]
            }
          },
          "aggs": {
            "by_host": {
              "terms": {
                "field": "beat.hostname",
                "size": 100
              },
              "aggs": {
                "by_fs": {
                  "terms": {
                    "field": "fs.device_name"
                  },
                  "aggs": {
                    "max": {
                      "max": {
                        "field": "fs.used_p"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "gte": 5
      }
    }
  },
  "actions": {
    "email_alert": {
      "throttle_period": "15m",
      "email": {
        "to": "user@example.org",
        "subject": "Watcher {{ctx.watch_id}} at {{ctx.trigger.triggered_time}}: Filesystem Usage Alert for some hosts",
        "body": "{{#ctx.payload.aggregations.by_host.buckets}}Host {{key}} with {{#by_fs.buckets}}[{{key}}/{{max.value}}]{{/by_fs.buckets}}\n{{/ctx.payload.aggregations.by_host.buckets}}"
      }
    }
  }
}

If you check out the email you are receiving you will see that it is pretty raw and you should invest some time, to maybe do a transform to create better values to output.

But this is the basic idea...

Hope this helps...

--Alex


(system) #3