How to search for anomalies in metrics?

jerrac · March 3, 2020, 10:58pm

I want to create a small script that will query ES for data, and then send me an alert when it sees that there is "anomalous" data.

For example, if the cpu load on a vm is suddenly spiking. I'd like an alert. That's easy enough to do if I just set a threshold, but I'd rather check for a sudden change in load/usage. Some vm's will naturally have a high cpu load, or RAM usage, with others will not.

I am digging into Elasticsearch Query DSL and the various aggregations to try and create my own script for this. Basically, run a query, check for a condition, and then send an alert, or not.

All my search results on this topic end up at proprietary solutions or Elastalert. I have no budget for this, and my attempts at getting Elastalert working were not successful. Though I may revisit once I understand how to search ES better.

Here are a few specific things I want to watch for:

A) If CPU load has gone up by more than 200% in the past 30 minutes.
B) If RAM usage has gone up by more than 200% in the past 30 minutes.
C) If the number of Apache requests has suddenly gone down by more than 50% in the past 60 minutes.

How would you go about watching for those things?

In my research, I discovered the Median Absolute Deviation aggregation. Would watching that be a way to get close to what I'm after?

I was able to build this query:

GET /_search
{
 "query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        }
      ],
      "filter": [
        {
          "match_phrase": {
            "host.name": "learnescentos7"
          }
        },
        {
          "match_phrase": {
            "agent.type": "metricbeat"
          }
        },
        {
          "match_phrase": {
            "metricset.name": "load"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "now-30m",
              "lte": "now",
              "time_zone": "America/Los_Angeles"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  },
  "aggs": {
    "avg_load_1": { "avg": { "field": "system.load.1" }},
    "max_load_1": { "max": { "field": "system.load.1" }},
    "min_load_1": { "min": { "field": "system.load.1" }},
    "variability_1": { "median_absolute_deviation": { "field": "system.load.1" }},
    "avg_load_5": { "avg": { "field": "system.load.5" }},
    "max_load_5": { "max": { "field": "system.load.5" }},
    "min_load_5": { "min": { "field": "system.load.5" }},
    "variability_5": { "median_absolute_deviation": { "field": "system.load.5" }},
    "avg_load_15": { "avg": { "field": "system.load.15" }},
    "max_load_15": { "max": { "field": "system.load.15" }},
    "min_load_15": { "min": { "field": "system.load.15" }},
    "variability_15": { "median_absolute_deviation": { "field": "system.load.15" }}
  }
}

Would alerting on variability_1 being larger than 0 get me anywhere close to knowing if cpu load has gone up? I think I might be off track here since I think that aggregation will represent load going down as much as it does going up....

Any advice would be welcome. Thanks!

system · March 31, 2020, 10:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Detecting anomalies in metricbeats data Beats elastic-stack-machine-learning	2	420	July 20, 2020
Integration elastalert and elasticsearch Elasticsearch	1	618	October 11, 2019
How to query for anomalous state changes based on duration Elasticsearch	3	372	October 19, 2020
Question about Elasticsearch query Elasticsearch elastic-stack-alerting	2	228	June 7, 2023
Get analytics with potential alerts if anomalies detected Elasticsearch	6	202	October 15, 2023

How to search for anomalies in metrics?

Related topics