Alert rule with relative threshold

Hi, I'm new to elasticsearch and would appreciate your help setting up the following alert rule:
I have a metric counter named "DuckError". I want to have an alert when its sum over one interval is over X% than its sum over a previous interval (for example, its sum today vs. its sum yesterday).
I thought about using Elasticsearch query rule with a DSL query that defines 3 fields (sum yesterday, sum today and the ratio), and returns 1 document if the ratio is above 1.3 (for example), or no documents otherwise. I couldn't create such DSL query though.
Is this the right direction? How should this query look like?
Thanks

Sometimes I get a bit stuck on terminology, but what do you specifically mean by a "metric counter" here?

My assumption is a monotonically increase integer, that is counting something. e.g. DuckError was say =200 3 days ago, then =250 2 days ago, then =300 yesterday, and say 400 today. ( You likely have data points at significantly more than once per day. )

e.g. prometheus' definition for counter is:

Counter

A counter is a cumulative metric that represents a single, monotonically increasing counter whose value can only increase or be reset to zero on restart

If so, then summing up those values is not really the right way to go, you generally want to look at rate.

Certainly you can use rate aggregations in alerts, its documented here:

As ever, if you share a few sample documents (limited to the fields of interest) it's easier to understand the question, if I've misunderstood here.

Thanks for the reply!

On each document the field's value is a positive integer. The value is not aggregative, I want to compare the sum of values during different ingervals

I feel confident there is an easier way, the dummy aggr below is redundant but I couldn't;t get syntax right without it, and maybe ES|QL can do this much easier (todo to learn that better), but ... using this silly sample data

DELETE /test
POST _bulk
{ "index" : { "_index" : "test", "_id" : "0" } }
{ "@timestamp": "2025-02-15T00:00:00Z", "field1" : 10 }
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "@timestamp": "2025-02-15T03:00:00Z", "field1" : 15 }
{ "index" : { "_index" : "test", "_id" : "2" } }
{ "@timestamp": "2025-02-15T06:00:00Z", "field1" : 10 }
{ "index" : { "_index" : "test", "_id" : "3" } }
{ "@timestamp": "2025-02-15T09:00:00Z", "field1" : 15 }
{ "index" : { "_index" : "test", "_id" : "4" } }
{ "@timestamp": "2025-02-15T12:00:00Z", "field1" : 10 }
{ "index" : { "_index" : "test", "_id" : "5" } }
{ "@timestamp": "2025-02-15T15:00:00Z", "field1" : 14 }
{ "index" : { "_index" : "test", "_id" : "6" } }
{ "@timestamp": "2025-02-15T18:00:00Z", "field1" : 10 }
{ "index" : { "_index" : "test", "_id" : "7" } }
{ "@timestamp": "2025-02-15T21:00:00Z", "field1" : 16 }
{ "index" : { "_index" : "test", "_id" : "10" } }
{ "@timestamp": "2025-02-16T00:00:00Z", "field1" : 17 }
{ "index" : { "_index" : "test", "_id" : "11" } }
{ "@timestamp": "2025-02-16T03:00:00Z", "field1" : 20 }
{ "index" : { "_index" : "test", "_id" : "12" } }
{ "@timestamp": "2025-02-16T06:00:00Z", "field1" : 17 }
{ "index" : { "_index" : "test", "_id" : "13" } }
{ "@timestamp": "2025-02-16T09:00:00Z", "field1" : 20 }
{ "index" : { "_index" : "test", "_id" : "14" } }
{ "@timestamp": "2025-02-16T12:00:00Z", "field1" : 17 }
{ "index" : { "_index" : "test", "_id" : "15" } }
{ "@timestamp": "2025-02-16T15:00:00Z", "field1" : 20 }
{ "index" : { "_index" : "test", "_id" : "16" } }
{ "@timestamp": "2025-02-16T18:00:00Z", "field1" : 17 }
{ "index" : { "_index" : "test", "_id" : "17" } }
{ "@timestamp": "2025-02-16T21:00:00Z", "field1" : 20 }

then this gives the right answer

POST /test/_search
{
  "size": 0,
  "aggs": {
    "dummy": {
      "terms": {
        "size": 1,
        "field": "_index"
      },
      "aggs": {
        "date_period1": {
          "filter": {
            "range": {
              "@timestamp": {
                "gte": "now-2d/d",
                "lt": "now-1d/d"
              }
            }
          },
          "aggs": {
            "field_sum": {
              "sum": {
                "field": "field1"
              }
            }
          }
        },
        "date_period2": {
          "filter": {
            "range": {
              "@timestamp": {
                "gte": "now-1d/d",
                "lt": "now/d"
              }
            }
          },
          "aggs": {
            "field_sum": {
              "sum": {
                "field": "field1"
              }
            }
          }
        },
        "percent_inc": {
          "bucket_script": {
            "buckets_path": {
              "firstPeriod": "date_period1>field_sum",
              "secondPeriod": "date_period2>field_sum"
            },
            "script": "(params.secondPeriod-params.firstPeriod)*100/params.firstPeriod"
          }
        }
      }
    }
  }
}

which is

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 16,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "dummy": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "test",
          "doc_count": 16,
          "date_period1": {
            "doc_count": 8,
            "field_sum": {
              "value": 100
            }
          },
          "date_period2": {
            "doc_count": 8,
            "field_sum": {
              "value": 148
            }
          },
          "percent_inc": {
            "value": 48
          }
        }
      ]
    }
  }
}
2 Likes

Thank you very much for the answer!
The response seem to contain the value I need under "percent_inc", but I cannot refer to it when setting the threshold in the Elasticsearch query rule creation page.
Eventually I want to alert when "percent_inc" > 1.3 for example, and I'm not sure how that's possible

Watcher is a paid feature, I dont have a license, so dont know.

Alerts seems not include a generic "query and aggregate like this, look at that field in the response" support, that I can see. Happy to be corrected.

So one idea would be to use a transform, and stick the value of percent_inc in another index that you can then monitor.

However, I believe there are easier ways to achieve the same thing. This approach seems overly complex even to me. I hope someone else makes a better suggestion.