I am noticing random watch timeouts on my cluster that is returning the following traceback:
[2021-07-31T05:08:11,951][DEBUG][o.e.x.w.e.ExecutionService] failed to execute watch [<INSERT RANDOM WATCHER HERE>]
org.elasticsearch.ElasticsearchTimeoutException: java.util.concurrent.TimeoutException: Timeout waiting for task.
at org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:78) ~[elasticsearch-7.10.1.jar:7.10.1]
at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:61) ~[elasticsearch-7.10.1.jar:7.10.1]
at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:55) ~[elasticsearch-7.10.1.jar:7.10.1]
at org.elasticsearch.xpack.watcher.execution.ExecutionService.updateWatchStatus(ExecutionService.java:380) [x-pack-watcher-7.10.1.jar:7.10.1]
at org.elasticsearch.xpack.watcher.execution.ExecutionService.execute(ExecutionService.java:321) [x-pack-watcher-7.10.1.jar:7.10.1]
at org.elasticsearch.xpack.watcher.execution.ExecutionService.lambda$executeAsync$5(ExecutionService.java:420) [x-pack-watcher-7.10.1.jar:7.10.1]
at org.elasticsearch.xpack.watcher.execution.ExecutionService$WatchExecutionTask.run(ExecutionService.java:626) [x-pack-watcher-7.10.1.jar:7.10.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:678) [elasticsearch-7.10.1.jar:7.10.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: java.util.concurrent.TimeoutException: Timeout waiting for task.
at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:243) ~[elasticsearch-7.10.1.jar:7.10.1]
at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:65) ~[elasticsearch-7.10.1.jar:7.10.1]
at org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:76) ~[elasticsearch-7.10.1.jar:7.10.1]
... 10 more
I think the key part is that its failing on:
org.elasticsearch.xpack.watcher.execution.ExecutionService.updateWatchStatus(ExecutionService.java:380) [x-pack-watcher-7.10.1.jar:7.10.1]
Which in the source code equates to:
client.update(updateRequest).actionGet(indexDefaultTimeout);
I assume it is trying to update the .watches document and is taking a long time, but I am not 100% sure. Has anyone seen this error before, and if so any way to resolve? The watchers are running on cool nodes, and the cluster is pretty big but load doesnt seem to get too high on these nodes.