Metric Threshold Alert reporting incorrect document count

Okay, wanted to give a quick update here. I updated our Elasticsearch deployment to 8.11 so I'm really excited about all the new stuff here. Mainly the custom threshold alert! Playing with that now to see if it works better.

Back to the metric threshold, this is what I have for my data view

Now my rule looks like this:

What I see in the graph, and even in the metric explorer actually looks accurate. That's about what I expected. But then document count is still reported up as some large number. Unfortunately my alerts now fire up too when they really shouldn't be because it's using that count and not what we're seeing here

Okay, final update on this. I deleted all of my rules and recreated them (exact same settings). They're now firing off properly and reporting the correct document count.

It looks like old rules don't retroactively update their data views when changing the settings under infrastructure. Seems to be a bug in that case.

The custom threshold is fantastic! Worked perfectly for my case as well.

1 Like

Good to hear,
Thanks for posting your findings / solutions

Wow 8.11 ... right to the top, this is a BIG release.
Keep and eye out for an 8.11.1 etc as usually there is a patch release that comes in the next couple weeks... I would apply that when it comes out.

BTW the guidance I was given is if you see this

(rum-data-view)*

You should remove that. It is a "leftover"

1 Like

@stephenb Okay, circling back on this again as I'm convinced that I have another bug here. When I create the rule through the Rules page or through Kibana REST API, I get the same issue outlined above. However, if I create the rule through metrics explorer it gives me the correct document count. It is a very confusing situation since there's nothing noticeably different between the alerts I create through metric explorer or the rules page.

In second thought, I don't actually think the original issue is solved at all

I would then GET that working rule via API and see how it compares to the other ones when you GET them

If you can create a repeatable case I would encourage you to open an issue.

Good call out on the GET. It looks like the difference between both is the consumer. The working one has a consumer of "infrastructure" while the other one is "alerts". We're actually creating our alerts through the API, so I'm going to change it there as a part of our body request

@stephenb

Okay, that didn't work. Consumer is set to infrastructure now but I'm still seeing the same issues.

Here are details for the two alerts:

{
        "id": "debd0950-7e78-11ee-83ed-cbbbc46eadd0",
        "name": "PlayNiceP",
        "tags": [],
        "enabled": true,
        "consumer": "infrastructure",
        "throttle": null,
        "revision": 0,
        "running": false,
        "schedule": {
            "interval": "5m"
        },
        "params": {
            "criteria": [
                {
                    "comparator": ">",
                    "threshold": [
                        1
                    ],
                    "timeSize": 1,
                    "timeUnit": "h",
                    "aggType": "count"
                }
            ],
            "sourceId": "default",
            "alertOnNoData": true,
            "alertOnGroupDisappear": true,
            "filterQueryText": "labels.http_route: \"/pos/order/{orderId}/{version}/void\" and url.path : *",
            "filterQuery": "{\"bool\":{\"filter\":[{\"bool\":{\"should\":[{\"term\":{\"labels.http_route\":{\"value\":\"/pos/order/{orderId}/{version}/void\"}}}],\"minimum_should_match\":1}},{\"bool\":{\"should\":[{\"exists\":{\"field\":\"url.path\"}}],\"minimum_should_match\":1}}]}}"
        },
        "rule_type_id": "metrics.alert.threshold",
        "created_by": "2804626948",
        "updated_by": "2804626948",
        "created_at": "2023-11-08T20:53:30.310Z",
        "updated_at": "2023-11-08T20:53:30.310Z",
        "api_key_owner": "2804626948",
        "notify_when": null,
        "mute_all": false,
        "muted_alert_ids": [],
        "scheduled_task_id": "debd0950-7e78-11ee-83ed-cbbbc46eadd0",
        "execution_status": {
            "status": "active",
            "last_execution_date": "2023-11-08T20:53:45.926Z",
            "last_duration": 273
        },
        "actions": [
            {
                "group": "metrics.threshold.fired",
                "id": "elastic-cloud-email",
                "params": {
                    "message": "{{context.reason}}\n\n{{rule.name}} is active with the following conditions:\n\n- Affected: {{context.group}}\n- Metric: {{context.metric}}\n- Observed value: {{context.value}}\n- Threshold: {{context.threshold}}\n\n[View alert details]({{context.alertDetailsUrl}})\n",
                    "to": [
                        "vlad@gmail.com"
                    ],
                    "subject": "Working Test"
                },
                "connector_type_id": ".email",
                "frequency": {
                    "summary": false,
                    "notify_when": "onActionGroupChange",
                    "throttle": null
                },
                "uuid": "f459c4a4-095f-4376-8c79-e1b35b8b8616"
            }
        ],
        "last_run": {
            "alerts_count": {
                "active": 1,
                "new": 0,
                "recovered": 0,
                "ignored": 0
            },
            "outcome_msg": null,
            "outcome_order": 0,
            "outcome": "succeeded",
            "warning": null
        },
        "next_run": "2023-11-08T20:58:45.863Z",
        "api_key_created_by_user": false
    },

 {
        "id": "5041aa00-7e78-11ee-83ed-cbbbc46eadd0",
        "name": "Infrastructure",
        "tags": [],
        "enabled": true,
        "consumer": "infrastructure",
        "throttle": null,
        "revision": 2,
        "running": false,
        "schedule": {
            "interval": "30m"
        },
        "params": {
            "criteria": [
                {
                    "comparator": ">=",
                    "timeSize": 1,
                    "aggType": "count",
                    "threshold": [
                        1
                    ],
                    "timeUnit": "h"
                }
            ],
            "sourceId": "default",
            "alertOnNoData": true,
            "alertOnGroupDisappear": true,
            "groupBy": [
                "labels.storeName",
                "labels.retailer"
            ],
            "filterQueryText": "labels.http_route: \"/pos/order/{orderId}/{version}/void\" and url.path : *"
        },
        "rule_type_id": "metrics.alert.threshold",
        "created_by": "ruleMaker",
        "updated_by": "2804626948",
        "created_at": "2023-11-08T20:49:31.300Z",
        "updated_at": "2023-11-08T20:50:15.827Z",
        "api_key_owner": "2804626948",
        "notify_when": null,
        "mute_all": false,
        "muted_alert_ids": [],
        "scheduled_task_id": "5041aa00-7e78-11ee-83ed-cbbbc46eadd0",
        "execution_status": {
            "status": "active",
            "last_execution_date": "2023-11-08T20:50:24.939Z",
            "last_duration": 444
        },
        "actions": [
            {
                "group": "metrics.threshold.fired",
                "id": "8f203190-7d54-11ed-a2f3-7763c1be2fed",
                "params": {
                    "body": "{\"alertName\": \"{{rule.name}}\",\"reason\":\"{{context.reason}}\",\"group\":\"{{context.group}}\"}"
                },
                "connector_type_id": ".webhook",
                "frequency": {
                    "summary": false,
                    "notify_when": "onActiveAlert",
                    "throttle": null
                },
                "uuid": "01262eae-49d1-4a9e-9e0b-86d0f7f12870"
            },
            {
                "group": "metrics.threshold.fired",
                "id": "elastic-cloud-email",
                "params": {
                    "message": "{{context.reason}}\n\n{{rule.name}} is active with the following conditions:\n\n- Affected: {{context.group}}\n- Metric: {{context.metric}}\n- Observed value: {{context.value}}\n- Threshold: {{context.threshold}}\n\n[View alert details]({{context.alertDetailsUrl}})\n",
                    "to": [
                        "vlad@gmail.com"
                    ],
                    "subject": "Test"
                },
                "connector_type_id": ".email",
                "frequency": {
                    "summary": false,
                    "notify_when": "onActiveAlert",
                    "throttle": null
                },
                "uuid": "96a40134-9efe-4e58-aa03-3ad150fbb8bc"
            }
        ],
        "last_run": {
            "alerts_count": {
                "active": 1,
                "new": 0,
                "recovered": 0,
                "ignored": 0
            },
            "outcome_msg": null,
            "outcome_order": 0,
            "outcome": "succeeded",
            "warning": null
        },
        "next_run": "2023-11-08T21:20:24.874Z",
        "api_key_created_by_user": false
    },

Now there are hardly any difference between the two. But the one I created in the metric explorer works just fine while the other didn't. I'm just completely out of ideas now. This is such a strange behavior

EDIT:

I think I just got lead on something. That number it's popping out is close to what I would get if I didn't the filter query. When comparing those two jsons, it looks like the working one has a filterQuery field while the other one doesn't. Trying to figure out if we need to pass that as a part of the body in our request to kibana API

I formatted your code. You just put three back ticks before and after edit that post and you'll see what I did that makes it much easier to.

Second, you didn't show the actual commands. I don't know which is which..

But I agree If the filters are not the same, I would hone in on that

I see query filter field in both but one is by kql and one is different.

Thank you Stephen. The missing filterQuery appears to be the issue after all. Added that to my request body when creating rules through the kibana api and it worked

Ohh I see that...

Is in both but

Is only in 1....

Exactly which alert type is it missing in? And is it missing when you create it through the UI or API or both?

FilterQuery is missing in the one that's not working. So that alert is behaving as if there's no filter at all, hence the large numbers. When we're creating the rule, we actually missed adding that in the request body. The rule would get created but it wouldn't behave right. Adding the proper value to that field fixed my issue! We did have filterQueryText but that alone wasn't enough it seems

1 Like

So you were always creating these via the API without filterQuery but then looking at them in the UI? Which showed filterQueryText which was confusing... made it seem like the filter was not working

Is that what was happening?

Hey @stephenb, yes it seems you have a pretty good understanding of the situation.

The alerts created through the metric explorer were all working as intended. However, we create our rules through the kibana API due to an integration with our react app. Looking at the rule from the UI, it looked alright since it had been showing the filterQueryText. Though underneath, it was missing a filterQuery value so in reality it wasn't filtering at all. I would think that the filterQuery would be derived from the filterQueryText since the Kibana didn't complain the missing field but that doesn't seem to be the case

As always, thank you so much for taking the time to look at this issue with me

1 Like