Hi,
I've got several watcher that watch the metrics of kafka. Now something strange happens. When I create the watch via de API the watch keeps firing. When I manually save the watcher the status becomes OK. I tried copying the watcher to the json I'm inserting with the watch api but I cannot see any difference. We are using the Elastic Cloud on version 8.1.0.
The json of the watch
{
"trigger": {
"schedule": {
"interval": "5s"
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
".ds-metrics-kafka.consumergroup-*"
],
"rest_total_hits_as_int": true,
"body": {
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"kafka.consumergroup.id": "admin-application-user"
}
},
{
"range": {
"@timestamp": {
"gte": "now-5m"
}
}
}
]
}
},
"aggs": {
"consumers": {
"terms": {
"field": "kafka.consumergroup.id"
},
"aggs": {
"lag": {
"avg": {
"field": "kafka.consumergroup.consumer_lag"
}
}
}
}
}
}
}
}
},
"condition": {
"script": {
"source": "if (ctx.payload.hits.total == 0) { return true; } else { if(ctx.payload.aggregations.consumers.buckets[0].lag.value > params.lag_threshold) { return true; } return false; }",
"lang": "painless",
"params": {
"lag_threshold": 5
}
}
},
"actions": {
"notify-pagerduty": {
"throttle_period_in_millis": 172800000,
"pagerduty": {
"description": "[Error] Found {{ ctx.payload.hits.total }} docs. The admin-application-user consumer is down or exceeds the lag threshold of 5. See the payload for details",
"attach_payload": true,
"account": "kafka"
}
}
}
}
Any suggestions on how to tackle this problem?
Gerard