How do I find these anomalies?

I have an index that contains the response time of a web service. I am unable to build an ML job that finds the anomalies I am interested in. The service slowed down. Actually it slows down quite often, so when I try to use high sum, or mean or high mean it finds anomalies everywhere. The anomalies I would like to find are where the minimum value of a field is unusually large, but the job only finds anomalies when the minimum value is lower than normal.

In this graph I think the anomalies are on the right hand side. The minimum is 2 orders of magnitude higher than normal. But ML finds them on the left when things happen even faster than normal. Is min really handled that differently to the other functions?

image

Hello,

The min function will only find anomalies on the "low" side of things, never when a metric is higher than typical. This is consistent behavior with all of the other "one sided" functions (like low_count, high_distinct_count, etc.)

You really should be using max in the case above to catch those spikes. The spikes shown in your picture will be clearly assigned a critical (red) anomaly score (more than 75). Remember, that anomalies are scored in a relative way. So, early in the analysis, the small-ish deviations might be scored significantly. As more behaviors are observed (like these giant spikes), then those smaller deviation anomalies will be downgraded and the bigger anomalies will be scored appropriately.

But in terms of max, the values for those time periods are not particularly anomalous. They are all blue. These are response times, and slow responses are a matter of routine. The anomaly here was that every response was slow, so the minimum was unusually high. Here is max run over the same time period.

Hmmm...I mean I see that your plot of max values is less dramatic than the plot of min values, but I still would have thought that the anomalies would have been scored higher, judging on the history of the plot that's visible in the screenshot (unless past history also has such dramatic spikes as well...)

Anyway, one other possible idea would be to feed the ML with min-aggregated samples per bucket, but then still use a max function on them.

Take a look at this doc which shows you how to pass aggregated data from Elasticsearch to ML: https://www.elastic.co/guide/en/x-pack/5.5/ml-configuring-aggregation.html

In your query agg, you can do a min aggregation on the responsetime field (instead of the avg aggregation shown in the example). This means that for every bucket, only the minimum value is being passed to ML.

From that point, ML can still do a max analysis on these min values!

I tried it on a sample data set that I have and it works...

1 Like

Never mind ...

I am unable to get this to work. I want to find anomalies in the min of a field called timetaken, split by a field called clones.

Based on the documentation you pointed to I tried this for the detector

PUT _xpack/ml/anomaly_detectors/ihs-timetaken-min-agg
{
  "analysis_config": {
    "bucket_span": "5m",
    "detectors": [{
      "function":"max",
      "field_name":"min.timetaken",
      "by_field_name":"clones"
    }],
    "summary_count_field_name": "doc_count"
  },
  "data_description": {
    "time_field":"max.timestamp"
  }
}
and this for the datafeed
PUT _xpack/ml/datafeeds/ihs-timetaken-min-agg-feed
{
  "job_id":"ihs-timetaken-min-agg",
  "indices": ["copy-ihs-logstash-2017.09.1*"],
  "types": [
        "_default_",
        "access-log"
  ],
  "aggregations": {
    "buckets": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "300s",
        "time_zone": "UTC"
      },
      "aggregations": {
        "max.timestamp": {
          "max": {"field": "@timestamp"}
        },
        "clones": {
          "terms": {
            "field": "clones",
            "size": 34
          },
          "aggregations": {
            "min.timetaken": {
              "min": {
                "field": "web.timetaken"
              }
            }
          }
        }
      }
    }
  }
}
Where access-log is the _type of the documents in those indexes. If I _search that aggregation with the query
  "query": {
    "match_all": {
      "boost": 1
    }

then the aggregations data that comes back looks OK to me.

  "aggregations": {
    "buckets": {
      "buckets": [
        {
          "key_as_string": "2017-09-11T23:55:00.000Z",
          "key": 1505174100000,
          "doc_count": 2,
          "clones": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "1stkey",
                "doc_count": 1,
                "min.timetaken": {
                  "value": 87281
                }
              },
              {
                "key": "2ndkey",
                "doc_count": 1,
                "min.timetaken": {
                  "value": 30439
                }
              }
            ]
          },
          "max.timestamp": {
            "value": 1505174398000,
            "value_as_string": "2017-09-11T23:59:58.000Z"
          }
        }, [...]

If I then run the job in Kibana from the start of the data to now, I get nothing back and elasticsearch logs the following

[2017-10-02T14:04:17,308][INFO ][o.e.x.m.d.DatafeedManager] Starting datafeed [ihs-timetaken-min-agg-feed] for job [ihs-timetaken-min-agg] in [1970-01-01T00:00:00.000Z, 2017-10-02T18:04:15.001Z)
[2017-10-02T14:04:17,312][WARN ][o.e.x.m.d.DatafeedManager] [ihs-timetaken-min-agg] Datafeed lookback retrieved no data

Any suggestions?

Hi @richcollier

Well it took a while, but having gotten the farequote job to run I found what I think the issue is. That document states:

In this example, the airline, responsetime, and time fields are aggregations.

which can only be referring to

"data_description": {
"time_field":"time"
}

Having both time_field and the aggregation name be max.timestamp is sufficient to avoid a "missing max aggregation for time_field" error, but not sufficient to avoid "Datafeed lookback retrieved no data". For that I need to modify my logstash index to index copy to add_field @timestamp into the max.timestamp field. Once every document has max.timestamp, the above detector and datafeed work just fine. Of course at that point max.timestamp no longer makes sense as the name :slight_smile:

But the job runs and I do finally get a severity of 76. It does not plot the actual values, but i can live with that.

Thanks for the assistance!

Hi,

There is a requirement that the max time aggregation name is the same as the actual time field. We will make sure to update the documentation and the validations to reflect this.

So, in your case, you do not need to copy @timestamp to the max.timestamp field. You simply need to name the max time aggregation same as its field: @timestamp.

Kind regards,
Dimitris

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.