Aletring based on anomaly duration

Well, there's probably many ways this could be solved, but here's one approach - see example

  1. Run once per day, look over the last 24 hours (the range in the example needs to be modified here because the example uses old data, not live data)
  2. Filter your query by job_id and result_type:record
  3. Do a terms aggregation on the partition field
  4. Do a date_histogram sub-aggregation with an interval that matches the bucket_span of the job (the example shown had a 1m bucket span due to the data set being used in order to have consecutive anomalous buckets, so you would need to change to 15m)
  5. Use the moving_fn aggregation to invoke a 3 bucket sliding window sum of the record_score
  6. Use a bucket_selector aggregation to eliminate any individual values where the record_score is below some arbitrary value (I chose 40).
  7. The condition script loops through and finds if any 3 bucket sliding window sum of the record_score is greater than some arbitrary value (I chose 120)
  8. In the actions section, gather up all of the partitions that violated the threshold and print them with the latest timestamp at which they violated (obviously use your preferred action method)

An example output is:

          Anomalies:
          ==========
          AAL had 3 anomalies in a row at 2021-02-10T12:32:00.000Z
          AWE had 3 anomalies in a row at 2021-02-10T19:19:00.000Z
          AMX had 3 anomalies in a row at 2021-02-10T22:10:00.000Z
1 Like