Why Kibana alerts are not consistent?

I have Kibana alert where I am NOT getting all alerts. This is inconsistent behavior where it works some times and sometimes not. I am not sure why.

Based on my condition I am checking certain status every minute (now - 1m). Kibana had 5 incidents that fits this criteria but only received for alerts.

FYI, this alerts are being sent to Slack channel. Below is the case where I have 5 events but only 4 alerts.

Hi,
Can you paste your watch here( obfuscate the sensitive information) . Also the Kibana logs would be helpful too and the version of the stack you are using.

Thanks
Rashmi

Kibana Version: 7.1.1 Management

Query:

  {
        "size": 1,
        "timeout": "30000ms",
        "query": {
            "bool": {
                "must": [
                    {
                        "match_phrase": {
                            "environment": {
                                "query": "prod",
                                "slop": 0,
                                "zero_terms_query": "NONE",
                                "boost": 1
                            }
                        }
                    },
                    {
                        "match_phrase": {
                            "service": {
                                "query": "tce-order",
                                "slop": 0,
                                "zero_terms_query": "NONE",
                                "boost": 1
                            }
                        }
                    }
                ],
                "filter": [
                    {
                        "range": {
                            "@timestamp": {
                                "from": "now-1m",
                                "to": "now",
                                "include_lower": true,
                                "include_upper": true,
                                "format": "epoch_millis",
                                "boost": 1
                            }
                        }
                    },
                    {
                        "match_phrase": {
                            "data.dimensions.status": {
                                "query": "FAILED",
                                "slop": 0,
                                "zero_terms_query": "NONE",
                                "boost": 1
                            }
                        }
                    }
                ],
                "adjust_pure_negative": true,
                "boost": 1
            }
        },
        "version": true,
        "_source": {
            "includes": [],
            "excludes": []
        },
        "stored_fields": "*",
        "sort": [
            {
                "timestamp": {
                    "order": "desc",
                    "unmapped_type": "boolean"
                }
            }
        ],
        "aggregations": {
            "2": {
                "date_histogram": {
                    "field": "data.timestamp",
                    "time_zone": "America/Los_Angeles",
                    "interval": "10m",
                    "offset": 0,
                    "order": {
                        "_key": "asc"
                    },
                    "keyed": false,
                    "min_doc_count": 1
                }
            }
        }
    }

@rashmi any updates here? If you need some other details from me then I am happy to provide them.

What happens when the watch triggers? can u plz capture the logs and compare it to the time when it does not trigger ?

Also alternatively, you may want to try instead then is the execute watch API, but with simulated action modes like this

POST _watcher/watch/my-watch/_execute
{
  "action_modes" : {
    "_all" : "simulate"
  }
}

Which executes your configured search, but will not execute the actions.. yet return useful info in the JSON response. Using the alternative_input with some scripting on the client side might be easier, to be honest.

Here is the link to similar question on testing a watch.: How can Watchers be tested?
https://www.elastic.co/blog/watching-the-watches-writing-debugging-and-testing-watches

I have attached screen shots of log entry and alerts so when you say "can u plz capture the logs and compare it to the time when it does not trigger ?", isn't that sufficient info?

What else I should be looking at? I tried your approach as well where my alert logic returns results.

The issue I am running into is, I can see logs entry for events but alerts are not consistent (i.e. 5 events in logs but only 4 alerts). This tells me that alert logic is correct.

can you check if watcher is started by checking the watcher stats and paste the output here?

Can you stop and start watcher and check with the watcher stats once again if everything is started?

Alternatively before doing that. Could you pick one watch, that does not get triggered currently and just store it again, and see if it gets triggered again?

I assume there is also nothing interesting in the log files? Has this been a multi node cluster at some point in time (I've never seen this so far, so super interested in more information).

Thanks
Rashmi

I can try watcher stats but in my past experience I did two things:

  1. In one case Alert was running but didn't trigger it so I went to monitor and just save it again. After that I received the alert.
  2. One case, Monitor was running fine but did not get all alerts. So I am not sure what is going on.

@flash1293 - whenever u get a chance - van u please shed more light here.

Thanks
Rashmi

I am using Monitor -> Trigger to send Alert to Slack.

Setup:

  1. There is Monitor. It is scheduled to run at every minute. I am using extraction query which has index with wildcard. This monitor has query which I have already posted.
  2. This monitor has one Trigger. Condition is this, ctx.results[0].hits.total.value > 0. It is configured to send Slack notification to particular channel.

My issue is, this works sometimes and sometimes not.

I hope this gives you clear idea about what I have so far and issue I am facing.

@spinscale ur advice would be helpful here.

Thanks
Rashmi

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.