Watcher stopped triggering

Jonathanfranzen · April 11, 2022, 9:15am

Hi,

I put up two Kibana Watchers in two different deployments last week. Now, one of the watchers doesn't trigger. It randomly stopped last friday, even though I didn't edit anything around that time. Now, even if I delete the watcher, create a new one, etc.. no other watcher works. No trigger, i.e, nothing shows up in execution history. My intervall is 300s.

What may be the issue? The code is almost identical to the watch that works well in the other deployment.

Here is the code:

{
  "trigger": {
    "schedule": {
      "interval": "300s"
    }
  },
  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": [
          "functionbeat*"
        ],
        "rest_total_hits_as_int": true,
        "body": {
          "size": 10,
          "query": {
            "bool": {
              "must": {
                "bool": {
                  "should": {
                    "terms": {
                      "level": [
                        "critical",
                        "alert",
                        "emergency"
                      ]
                    }
                  }
                }
              },
              "filter": {
                "range": {
                  "@timestamp": {
                    "gte": "{{ctx.trigger.scheduled_time}}||-315s",
                    "lte": "{{ctx.trigger.scheduled_time}}",
                    "format": "strict_date_optional_time||epoch_millis"
                  }
                }
              }
            }
          }
        }
      }
    }
  },
  "condition": {
    "script": {
      "source": "if (ctx.payload.hits.total > 0) return true;else return false;",
      "lang": "painless"
    }
  },
  "actions": {
    "slack_1": {
      "slack": {
        "message": {
          "text": """[{{ctx.metadata.name}}] just noticed following exceptions:

{{#ctx.payload.hits.hits}}- [{{_source.application}}] {{_source.level}}({{_source.app.monolog_level}}) with message:
    [{{_source.message}}]
    [Link="random url"/{{_index}}?id={{_id}}]


{{/ctx.payload.hits.hits}} in the last 5 minutes!"""
        }
      }
    }
  }
}

Jonathanfranzen · April 12, 2022, 8:00am

This happened after upgrading to v8.1.2

Jonathanfranzen · April 12, 2022, 1:03pm

It works well to manually trigger the watcher via API, so i have no clue why it isn't triggering automatically

Jonathanfranzen · April 12, 2022, 1:17pm

When i run GET _watcher/stats

This is returned:

"manually_stopped" : false,
  "stats" : [
    {
      "node_id" : "SnOONmkeRdagszfnBE40pg",
      "watcher_state" : "stopped",
      "watch_count" : 0,
      "execution_thread_pool" : {
        "queue_size" : 0,
        "max_size" : 10
      }
    }
  ]

Maybe my issue has something to do with "watcher_state" : "stopped",

Jonathanfranzen · April 13, 2022, 1:20pm

Found a solution for this, if anyone else is interested:

github.com/elastic/elasticsearch

Watcher can fail to restart in early version of 8.x

opened 08:03PM - 30 Mar 22 UTC

closed 03:48PM - 01 Apr 22 UTC

jakelandis

>bug :Data Management/Watcher Team:Data Management

### Description Watcher will perform some basic validation prior to starting. O…ne part of this validation is ensuring that if there is an alias to the .watcher-history-* index that it only contains a single index. In 8.0+ this validation can incorrectly cause Watcher to fail to start. In 8.0+ .watcher-history-* is a data stream (not an alias) and the data stream is expected to have multiple backing indices (unlike the elder alias support). The implementation/api is shared between resolving aliases and resolving data streams so the validation will incorrectly flag multiple backing indices for data streams as problematic and cause Watcher to fail to start. This issue will only happen when .watcher-history-16 data stream has multiple backing indices AND watcher needs to restart for any reason. Watcher can need to be restarted on cluster start, explicit start/stop, and any change to the allocation assignment for the .watches shards. The error message is (only seen for a logger at debug level ` "logger.org.elasticsearch.xpack.watcher": "debug"`) : ``` [2022-03-30T13:13:48,011][DEBUG][o.e.x.w.WatcherService ] [runTask-0] error validating to start watcher java.lang.IllegalStateException: Alias [.watcher-history-16] points to more than one index at org.elasticsearch.xpack.watcher.watch.WatchStoreUtils.getConcreteIndex(WatchStoreUtils.java:32) ~[x-pack-watcher-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT] at org.elasticsearch.xpack.watcher.history.HistoryStore.validate(HistoryStore.java:79) ~[x-pack-watcher-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT] at org.elasticsearch.xpack.watcher.WatcherService.validate(WatcherService.java:157) [x-pack-watcher-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT] at org.elasticsearch.xpack.watcher.WatcherLifeCycleService.clusterChanged(WatcherLifeCycleService.java:162) [x-pack-watcher-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT] at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateListener(ClusterApplierService.java:564) [elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT] at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateListeners(ClusterApplierService.java:550) [elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT] at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:510) [elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT] at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:428) [elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT] at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:154) [elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:714) [elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:260) [elasticsearch-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT] ``` You can determine if this is the case by looking at `GET _watcher/stats` and `"watcher_state": "stopped"` and ` "manually_stopped": false` which means that the system via validation, not a user has stopped Watcher. (you can also tell by increasing the log level and reviewing the logs) You can reproduce with the following steps: ``` PUT _cluster/settings { "persistent": { "logger.org.elasticsearch.xpack.watcher": "debug" } } PUT _watcher/watch/test { "trigger": { "schedule": { "interval": "1s" } }, "input": { "simple": { "key": "value" } }, "actions": { "my_logger": { "logging": { "text": "Watch payload [{{ctx.payload}}]", "level": "debug" } } } } GET _data_stream/.watcher-history-16?expand_wildcards=all POST _watcher/_stop POST _watcher/_start GET _watcher/stats POST .watcher-history-16/_rollover GET _data_stream/.watcher-history-16?expand_wildcards=all POST _watcher/_stop POST _watcher/_start GET _watcher/stats ``` You can work around this issue by ensuring there are never more than 1 backing index in the .watcher-history-16 data stream. Use the ILM policy to ensure at most 1 index for the watcher-history. ``` # increase max age to 7 days and remove min age from the delete phase of the default policy PUT _ilm/policy/watch-history-ilm-policy-16 { "policy": { "phases": { "hot": { "min_age": "0ms", "actions": { "rollover": { "max_primary_shard_size": "50gb", "max_age": "7d" } } }, "delete": { "actions": { "delete": { "delete_searchable_snapshot": true } } } }, "_meta": { "managed": true, "description": "default policy for the watcher history indices" } } } #wait up to 10 minutes (depending on ILM poll interval) to see a single backing index GET _data_stream/.watcher-history-16?expand_wildcards=all POST _watcher/_start GET _watcher/stats ```

system · May 11, 2022, 1:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Watcher works in simulation, but not live Kibana painless	1	586	March 1, 2022
Watches Not Triggering Elasticsearch elastic-stack-alerting	3	2648	October 19, 2018
Watcher not firing Kibana elastic-stack-alerting	2	412	July 27, 2021
Watcher doesn't work (empty result) Elasticsearch elastic-stack-alerting	22	4314	July 23, 2018
Scheduled Watches not Triggering Elasticsearch elastic-stack-alerting	10	2409	February 11, 2019

Watcher stopped triggering

Related topics