Hi,
I put up two Kibana Watchers in two different deployments last week. Now, one of the watchers doesn't trigger. It randomly stopped last friday, even though I didn't edit anything around that time. Now, even if I delete the watcher, create a new one, etc.. no other watcher works. No trigger, i.e, nothing shows up in execution history. My intervall is 300s.
What may be the issue? The code is almost identical to the watch that works well in the other deployment.
Here is the code:
{
"trigger": {
"schedule": {
"interval": "300s"
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"functionbeat*"
],
"rest_total_hits_as_int": true,
"body": {
"size": 10,
"query": {
"bool": {
"must": {
"bool": {
"should": {
"terms": {
"level": [
"critical",
"alert",
"emergency"
]
}
}
}
},
"filter": {
"range": {
"@timestamp": {
"gte": "{{ctx.trigger.scheduled_time}}||-315s",
"lte": "{{ctx.trigger.scheduled_time}}",
"format": "strict_date_optional_time||epoch_millis"
}
}
}
}
}
}
}
}
},
"condition": {
"script": {
"source": "if (ctx.payload.hits.total > 0) return true;else return false;",
"lang": "painless"
}
},
"actions": {
"slack_1": {
"slack": {
"message": {
"text": """[{{ctx.metadata.name}}] just noticed following exceptions:
{{#ctx.payload.hits.hits}}- [{{_source.application}}] {{_source.level}}({{_source.app.monolog_level}}) with message:
[{{_source.message}}]
[Link="random url"/{{_index}}?id={{_id}}]
{{/ctx.payload.hits.hits}} in the last 5 minutes!"""
}
}
}
}
}
This happened after upgrading to v8.1.2
It works well to manually trigger the watcher via API, so i have no clue why it isn't triggering automatically
When i run GET _watcher/stats
This is returned:
"manually_stopped" : false,
"stats" : [
{
"node_id" : "SnOONmkeRdagszfnBE40pg",
"watcher_state" : "stopped",
"watch_count" : 0,
"execution_thread_pool" : {
"queue_size" : 0,
"max_size" : 10
}
}
]
Maybe my issue has something to do with "watcher_state" : "stopped",
Found a solution for this, if anyone else is interested:
opened 08:03PM - 30 Mar 22 UTC
closed 03:48PM - 01 Apr 22 UTC
>bug
:Data Management/Watcher
Team:Data Management
### Description
Watcher will perform some basic validation prior to starting. O… ne part of this validation is ensuring that if there is an alias to the .watcher-history-* index that it only contains a single index. In 8.0+ this validation can incorrectly cause Watcher to fail to start. In 8.0+ .watcher-history-* is a data stream (not an alias) and the data stream is expected to have multiple backing indices (unlike the elder alias support). The implementation/api is shared between resolving aliases and resolving data streams so the validation will incorrectly flag multiple backing indices for data streams as problematic and cause Watcher to fail to start.
This issue will only happen when .watcher-history-16 data stream has multiple backing indices AND watcher needs to restart for any reason. Watcher can need to be restarted on cluster start, explicit start/stop, and any change to the allocation assignment for the .watches shards.
The error message is (only seen for a logger at debug level ` "logger.org.elasticsearch.xpack.watcher": "debug"`) :
```
[2022-03-30T13:13:48,011][DEBUG][o.e.x.w.WatcherService ] [runTask-0] error validating to start watcher
java.lang.IllegalStateException: Alias [.watcher-history-16] points to more than one index
at org.elasticsearch.xpack.watcher.watch.WatchStoreUtils.getConcreteIndex(WatchStoreUtils.java:32) ~[x-pack-watcher-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.xpack.watcher.history.HistoryStore.validate(HistoryStore.java:79) ~[x-pack-watcher-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.xpack.watcher.WatcherService.validate(WatcherService.java:157) [x-pack-watcher-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.xpack.watcher.WatcherLifeCycleService.clusterChanged(WatcherLifeCycleService.java:162) [x-pack-watcher-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateListener(ClusterApplierService.java:564) [elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateListeners(ClusterApplierService.java:550) [elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:510) [elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:428) [elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:154) [elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:714) [elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:260) [elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
```
You can determine if this is the case by looking at `GET _watcher/stats` and `"watcher_state": "stopped"` and ` "manually_stopped": false` which means that the system via validation, not a user has stopped Watcher. (you can also tell by increasing the log level and reviewing the logs)
You can reproduce with the following steps:
```
PUT _cluster/settings
{
"persistent": {
"logger.org.elasticsearch.xpack.watcher": "debug"
}
}
PUT _watcher/watch/test
{
"trigger": {
"schedule": {
"interval": "1s"
}
},
"input": {
"simple": {
"key": "value"
}
},
"actions": {
"my_logger": {
"logging": {
"text": "Watch payload [{{ctx.payload}}]",
"level": "debug"
}
}
}
}
GET _data_stream/.watcher-history-16?expand_wildcards=all
POST _watcher/_stop
POST _watcher/_start
GET _watcher/stats
POST .watcher-history-16/_rollover
GET _data_stream/.watcher-history-16?expand_wildcards=all
POST _watcher/_stop
POST _watcher/_start
GET _watcher/stats
```
You can work around this issue by ensuring there are never more than 1 backing index in the .watcher-history-16 data stream. Use the ILM policy to ensure at most 1 index for the watcher-history.
```
# increase max age to 7 days and remove min age from the delete phase of the default policy
PUT _ilm/policy/watch-history-ilm-policy-16
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_primary_shard_size": "50gb",
"max_age": "7d"
}
}
},
"delete": {
"actions": {
"delete": {
"delete_searchable_snapshot": true
}
}
}
},
"_meta": {
"managed": true,
"description": "default policy for the watcher history indices"
}
}
}
#wait up to 10 minutes (depending on ILM poll interval) to see a single backing index
GET _data_stream/.watcher-history-16?expand_wildcards=all
POST _watcher/_start
GET _watcher/stats
```
2 Likes
system
(system)
Closed
May 11, 2022, 1:20pm
6
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.