How to use Top_Hit in a alarm querry

Hi,

we use alarms to get notification about missing (or new) machines onbording to our elk. Therefore we querry the docs containing heartbeats for the last 15min and have a look if every machine has more than 12 beats. This should safely indicate if a machine is up or not. But since I must cover 3 min of loss, it's not possible tho react faster to the alarm. And furthermore, I get false alarms, if I powerup a machine.

Currently we use this alarm querry:

{
    "size": 0,
    "query": {
        "bool": {
            "filter": [
                {
                    "range": {
                        "mqput_time": {
                            "from": "{{period_end}}||-15m",
                            "to": "{{period_end}}||-0h",
                            "include_lower": true,
                            "include_upper": true,
                            "format": "epoch_millis",
                            "boost": 1
                        }
                    }
                },
                {
                    "term": {
                        "Payload.IoT.First.Name.keyword": {
                            "value": "Heartbeat",
                            "boost": 1
                        }
                    }
                }
            ],
            "adjust_pure_negative": true,
            "boost": 1
        }
    },
    "aggregations": {
        "Datenquelle": {
            "composite": {
                "size": 10,
                "sources": [
                    {
                        "LocationCaption": {
                            "terms": {
                                "field": "Payload.Location.Caption.keyword",
                                "missing_bucket": false,
                                "order": "asc"
                            }
                        }
                    },
                    {
                        "IoTFirstName": {
                            "terms": {
                                "field": "Payload.IoT.First.Name.keyword",
                                "missing_bucket": false,
                                "order": "asc"
                            }
                        }
                    }
                ]
            }
        }
    }
}

as result I retrieve this:

{
    "_shards": {
        "total": 10,
        "failed": 0,
        "successful": 10,
        "skipped": 9
    },
    "hits": {
        "hits": [],
        "total": {
            "value": 41,
            "relation": "eq"
        },
        "max_score": null
    },
    "took": 1010,
    "timed_out": false,
    "aggregations": {
        "Datenquelle": {
            "buckets": [
                {
                    "doc_count": 13,
                    "key": {
                        "IoTFirstName": "Heartbeat",
                        "LocationCaption": "Demo"
                    }
                },
                {
                    "doc_count": 14,
                    "key": {
                        "IoTFirstName": "Heartbeat",
                        "LocationCaption": "Loc1"
                    }
                },
                {
                    "doc_count": 14,
                    "key": {
                        "IoTFirstName": "Heartbeat",
                        "LocationCaption": "Loc2"
                    }
                }
            ],
            "after_key": {
                "IoTFirstName": "Heartbeat",
                "LocationCaption": "Loc3"
            }
        }
    }
}

and we put this to a mail using this:

Alarm on machine!

reason: 
 Monitor {{ctx.monitor.name}} @ {{ctx.trigger.name}} 

sources:  
{{#ctx.results.0.result}}
  • {{LocationCaption}} with {{value}} of {{ref}} required beats
{{/ctx.results.0.result}}

Link to dashboard:
{{#ctx.results.0.result}}
• {{LocationCaption}}: 
{{{url}}}

{{/ctx.results.0.result}}

This works fine so far, but currently I just can define a alarm by using doc_count to be at a certain amount. Asuming, A station posts 1 heartbeat pre minute, there should be 12 at least, if it is still running.

This is the selection of data from querry to results (for creating above mail):

ctx.results[0].result = [];

for (bucket in ctx.results[0].aggregations.Datenquelle.buckets){

    if( 
        bucket.key.IoTFirstName == "Heartbeat") {

        bucket.key.ref = 12;       //Ueberwachungswert für Anzahl Heartbeats
        bucket.key.url = "myULR)";
        bucket.key.value=bucket.doc_count;
        if (bucket.doc_count < bucket.key.ref) {
		
        // attach to result array
        ctx.results[0].result.add(bucket.key);
        }
    }
}

if (ctx.results[0].result.length > 0) return true;

In a dashboard i would rather use Top_Hit than "RED=Count<13". So in my thought it would be better to get something like:

For identification of "died" machines ("no beat within the last 5 min"):
METACODE: IF Date(Now)-5min > Top_HIT(msg.timestamp) THEN ... ADD TO RESULT

For identification of "new" machines ("oldest beat in scope younger than 5 min"):
METACODE: IF Date(Now)-5min < Last_Hit(msg.timestamp) THEN ... ADD TO RESULT

It would be fantastic if you could help me creating such a querry - I googled a lot, but could not find a matching sample so far.

Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.