Watches Not Triggering

jmccain · September 20, 2018, 4:09pm

Elasticsearch & Kibana version: 6.2.4

Hi,

This morning we noticed that several (but not all) of our watches were not triggering and had no watcher history. In particular, one watch should have been firing alerts off but was totally silent. Using POST _xpack/watch/{watch_id}/_execute to manually trigger the alert caused it to fire.

Stopping and starting watcher with POST _xpack/watcher/_stop & POST _xpack/watcher/_start caused all watches to resume triggering. Watcher histories are being recorded again as well.

This seems similar to the issue encountered here, and I am wondering if this is a known bug in X-Pack? Watches are working as expected now, but without figuring out what caused this I can't be sure it won't happen again.

Here are some details on the cluster:

Five nodes running Elasticsearch 6.2.4, one 6.2.4 Kibana instance, one 6.2.4 Logstash instance. We upgraded from 3 nodes to a 5 node cluster a few weeks ago, I am not sure if this coincides with when the watches stopped.

Result of GET_xpack/watcher/stats/ (done after restarting Watcher, not sure how much help it will be):

{
  "_nodes": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "cluster_name": "clustername",
  "manually_stopped": false,
  "stats": [
    {
      "node_id": "L8C9e9FgSTulsHHs7vEqLQ",
      "watcher_state": "started",
      "watch_count": 12,
      "execution_thread_pool": {
        "queue_size": 0,
        "max_size": 20
      }
    },
    {
      "node_id": "fdKsVvaFQ9i6er6ck5eoVQ",
      "watcher_state": "started",
      "watch_count": 15,
      "execution_thread_pool": {
        "queue_size": 0,
        "max_size": 20
      }
    },
    {
      "node_id": "RskSddGqQ26Doz6y4mmTsg",
      "watcher_state": "started",
      "watch_count": 0,
      "execution_thread_pool": {
        "queue_size": 0,
        "max_size": 0
      }
    },
    {
      "node_id": "C2tmZvL8QbWZ1CRZCHGWPQ",
      "watcher_state": "started",
      "watch_count": 0,
      "execution_thread_pool": {
        "queue_size": 0,
        "max_size": 20
      }
    },
    {
      "node_id": "J043eFCoSV65Pd5vT9XERQ",
      "watcher_state": "started",
      "watch_count": 0,
      "execution_thread_pool": {
        "queue_size": 0,
        "max_size": 0
      }
    }
  ]
}

elasticsearch.yml:

cluster.name: clustername
node.name: ch-elas-epm01-p
path.data: /data/elasticsearch/data
network.host: 0.0.0.0
bootstrap.memory_lock: true
http.port: 9200
xpack.security.enabled: false
thread_pool.search.queue_size: 10000
discovery.zen.ping.unicast.hosts: ["ch-elas-epm01-p","ch-elas-epm02-p","ch-elas-epm03-p","ch-elas-epm06-p","ch-elas-epm07-p"]
discovery.zen.minimum_master_nodes: 3
path.logs: /data/elasticsearch/logs
action.destructive_requires_name: true
xpack.notification.email:
    account:
        smtp_acct:
            profile: standard
            smtp.host: smtpserverhost
            smtp.auth: false
            smtp.user: no-reply@companydomain.com
    default_account: smtp_acct

jmccain · September 20, 2018, 4:11pm

Logs:

When looking through the Elasticsearch logs I found logs that the watches failed to trigger along with exceptions:


...

[2018-09-19T18:40:52,198][WARN ][o.e.x.w.e.ExecutionService] [ch-elas-epm01-p] failed to execute watch [t_cfCSv2RYaRWtPmFD1lkg_elasticsearch_nodes]

[2018-09-19T19:01:37,673][ERROR][o.e.x.w.e.ExecutionService] [ch-elas-epm01-p] failed to update watch record [t_cfCSv2RYaRWtPmFD1lkg_elasticsearch_nodes_fd7af7c7-7e60-4aa6-ab70-a323e1b51838-2018-09-20T00:00:52.501Z]

org.elasticsearch.ElasticsearchTimeoutException: java.util.concurrent.TimeoutException: Timeout waiting for task.

at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:68) ~[elasticsearch-6.2.4.jar:6.2.4]

at org.elasticsearch.xpack.watcher.history.HistoryStore.put(HistoryStore.java:97) ~[x-pack-watcher-6.2.4.jar:6.2.4]

at org.elasticsearch.xpack.watcher.execution.ExecutionService.execute(ExecutionService.java:337) ~[x-pack-watcher-6.2.4.jar:6.2.4]

at org.elasticsearch.xpack.watcher.execution.ExecutionService.lambda$executeAsync$6(ExecutionService.java:421) ~[x-pack-watcher-6.2.4.jar:6.2.4]

at org.elasticsearch.xpack.watcher.execution.ExecutionService$WatchExecutionTask.run(ExecutionService.java:575) [x-pack-watcher-6.2.4.jar:6.2.4]

at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:573) [elasticsearch-6.2.4.jar:6.2.4]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]

Caused by: java.util.concurrent.TimeoutException: Timeout waiting for task.

at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:235) ~[elasticsearch-6.2.4.jar:6.2.4]

at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:69) ~[elasticsearch-6.2.4.jar:6.2.4]

at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:66) ~[elasticsearch-6.2.4.jar:6.2.4]

... 8 more

[2018-09-19T19:01:37,702][ERROR][o.e.x.w.e.ExecutionService] [ch-elas-epm01-p] failed to update watch record [t_cfCSv2RYaRWtPmFD1lkg_xpack_license_expiration_2ddded28-df9e-4033-8588-95c4743b985a-2018-09-20T00:00:47.466Z]

org.elasticsearch.ElasticsearchTimeoutException: java.util.concurrent.TimeoutException: Timeout waiting for task.

at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:68) ~[elasticsearch-6.2.4.jar:6.2.4]

at org.elasticsearch.xpack.watcher.history.HistoryStore.put(HistoryStore.java:97) ~[x-pack-watcher-6.2.4.jar:6.2.4]

at org.elasticsearch.xpack.watcher.execution.ExecutionService.execute(ExecutionService.java:337) ~[x-pack-watcher-6.2.4.jar:6.2.4]

at org.elasticsearch.xpack.watcher.execution.ExecutionService.lambda$executeAsync$6(ExecutionService.java:421) ~[x-pack-watcher-6.2.4.jar:6.2.4]

at org.elasticsearch.xpack.watcher.execution.ExecutionService$WatchExecutionTask.run(ExecutionService.java:575) [x-pack-watcher-6.2.4.jar:6.2.4]

at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:573) [elasticsearch-6.2.4.jar:6.2.4]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]

Caused by: java.util.concurrent.TimeoutException: Timeout waiting for task.

at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:235) ~[elasticsearch-6.2.4.jar:6.2.4]

at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:69) ~[elasticsearch-6.2.4.jar:6.2.4]

at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:66) ~[elasticsearch-6.2.4.jar:6.2.4]

... 8 more

[2018-09-19T19:01:37,721][ERROR][o.e.x.w.e.ExecutionService] [ch-elas-epm01-p] failed to update watch record [cdm-dead-high-depth_ecedbc9b-7c02-485b-a101-52985710d404-2018-09-20T00:00:37.461Z]

org.elasticsearch.ElasticsearchTimeoutException: java.util.concurrent.TimeoutException: Timeout waiting for task.

at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:68) ~[elasticsearch-6.2.4.jar:6.2.4]

at org.elasticsearch.xpack.watcher.history.HistoryStore.put(HistoryStore.java:97) ~[x-pack-watcher-6.2.4.jar:6.2.4]

at org.elasticsearch.xpack.watcher.execution.ExecutionService.execute(ExecutionService.java:337) ~[x-pack-watcher-6.2.4.jar:6.2.4]

at org.elasticsearch.xpack.watcher.execution.ExecutionService.lambda$executeAsync$6(ExecutionService.java:421) ~[x-pack-watcher-6.2.4.jar:6.2.4]

at org.elasticsearch.xpack.watcher.execution.ExecutionService$WatchExecutionTask.run(ExecutionService.java:575) [x-pack-watcher-6.2.4.jar:6.2.4]

at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:573) [elasticsearch-6.2.4.jar:6.2.4]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]

Caused by: java.util.concurrent.TimeoutException: Timeout waiting for task.

at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:235) ~[elasticsearch-6.2.4.jar:6.2.4]

at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:69) ~[elasticsearch-6.2.4.jar:6.2.4]

at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:66) ~[elasticsearch-6.2.4.jar:6.2.4]

... 8 more

...

spinscale · September 21, 2018, 6:49am

Hey,

there have been two bugfixes in that area, one fixed in 6.3.0 and one fixed in 6.4.1, so I'd highly recommend an upgrade to 6.4.1. The bug can be read about it https://github.com/elastic/elasticsearch/pull/33167

Short version: Reload was not happening properly, when .watches shards were being moved around in the cluster. A reload basically did a reinit, which would explain the behaviour you are seeing above.

Regarding your second post: This would not have stopped execution of the watch itself. However the history entry does not seem to get indexed, as it failed to get indexed within 30s. Please monitor this and ping us, if you see this again. I'd also be interested in the thread pool nodes stats and nodes info, when this is happening.

Thanks a lot!

--Alex

system · October 19, 2018, 6:49am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Scheduled Watches not Triggering Elasticsearch elastic-stack-alerting	10	2397	February 11, 2019
Watchers Stopped Triggering Elasticsearch elastic-stack-alerting	9	3931	August 21, 2018
Alerting on cluster (watcher) Elasticsearch elastic-stack-alerting	2	417	February 1, 2019
Watcher is firing twice for the same watch Elasticsearch elastic-stack-alerting	7	1861	April 2, 2018
Watcher alerts not triggered Elasticsearch elastic-stack-alerting	6	1199	July 31, 2018

Watches Not Triggering

Related topics