Generate multiple alerts from elastic query

I want to generate an alert whenever a pod is restarted in kubernetes. I am planning to use kubernetes.container.status.restarts to identify whether pods are restarted or not. I would like to have a single elasticsearch query and generate multiple alerts one per pod per namespace which is restarted. I know this is a fairly standard requirement, but couldn't find any elasticsearch query examples. I tried the following query for a specific pod, but I am unable to generate individual alerts, but only a single alert. Can you please help ? Here is my elasticsearch alert query:

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "kubernetes.container.name": "hostinterface"
          }
        },
        {
          "range": {
            "kubernetes.container.status.restarts": {
              "gt": 1
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "group_by_namespace": {
      "terms": {
        "field": "kubernetes.namespace",
        "size": 10
      }
    }
  }
}

Hi,

it seems like you're missing a sub-aggregation to group by pod name within each namespace. Try with this:

"aggs": {
    "group_by_namespace": {
      "terms": {
        "field": "kubernetes.namespace",
        "size": 10
      },
      "aggs": {
        "group_by_pod": {
          "terms": {
            "field": "kubernetes.pod.name",
            "size": 10
          }
        }
      }
    }
  }

Regards

Thank you. I tried your suggestion. However, I am not getting multiple alerts, but only a single alert. Can you let me know if there is any issue in my elastic altert ? Here is a screenshot for your reference

Hi @umesh2020

I think you can do this fairly simply with a normal metric threshold alert and then group by namespace ...

The Pattern you're following is one of the base use cases for metric threshold alert

Thank you for your response. The reason I chose to use query is because with metric threshold, I was unable to get pod that restarted, the namespace etc. Will try it out and get back if I have further questions

I created a metric threshold rule as you suggested, but the issue is I don't get the pod name which restarted. Here is a partial context for your reference.

{
  "alertState": "ALERT",
  "group": "fi1-https",
  "groupByKeys": {
    "kubernetes": {
      "namespace": "fi1-https"
    }
  },
  "metric": {
    "condition0": "kubernetes.container.status.restarts"
  },
  "reason": "kubernetes.container.status.restarts is 187 in the last 1 min for fi1-https. Alert when > 1.",
  "threshold": {
    "condition0": [
      "1"
    ]
  },
  "timestamp": "2024-01-16T17:05:56.382Z",
  "value": {
    "condition0": "187"
  },
  "tags": []
}

Is there a way I can get the pod name and also the query results as part of the context ?

I got the pod name by grouping over both namespace and pod name. However, let me know if there is a way to get a link to the query results which I can use to get more details about the event. Appreciate your help.

Kibana Alerts do not work that way...

what I do is just put {{.}} in your action message action and you should see everthing that is available to you...

Then you can use what you want...

Give that a try...

Thank you. This is a very good tip

I have one more small issue. I am creating the rule using the following query using Kibana REST API. The query get's created successfully, however the query behaves like kuberntes.container.name: * instead of filtering out the containers that I am interested. I have included a snippet of my parameters for your reference.

        "name": "pod_status_restart",
        "notify_when": "onActionGroupChange",
        "params": {
            "alertOnGroupDisappear": true,
            "alertOnNoData": true,
            "criteria": [
                {
                    "aggType": "avg",
                    "comparator": ">",
                    "metric": "kubernetes.container.status.restarts",
                    "threshold": [
                        3
                    ],
                    "timeSize": 1,
                    "timeUnit": "m"
                }
            ],
            "filterQueryText": "kubernetes.container.name: hostinterface OR kubernetes.container.name: keycloak OR kubernetes.container.name: nginx OR kubernetes.container.name: redis",
            "groupBy": [
                "kubernetes.namespace",
                "kubernetes.pod.name"
            ],
            "nodeType": "pod",
            "sourceId": "default"
        },
        "revision": 0,
        "rule_type_id": "metrics.alert.threshold",

However, if I go to Kibana UI and then save the query again, it starts working. The difference I see is that after I save the rule, it converts the filterQueryText to an actual query and saves it in "filterQuery" field. Can you let me know why using filterQueryText doesn't work ?

I Do not exactly recall but perhaps take a look at this thread It may help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.