Watcher throwing thread pool capacity error

sumithub · July 9, 2015, 7:01am

Hello Folks,

We're running watcher beta version with few watches configured. After running for few days, we started getting "failed to run triggered watch [someID] due to thread pool capacity" error which stopped the watcher execution at all. None of the watches are getting triggered due to this issue. Help in this regard would be greatly appreciated.

Thanks
Sumit

mvg · July 9, 2015, 7:36am

Hi Sumit,

A few questions:

How many watches have configured?
How often are these Watches configured to trigger?
What is the nature of your watches? For example are watches configured to send emails?
Can you share the output of the watch info api:

curl 'localhost:9200/_watcher?pretty'

Then we can see with what version you're running with.

Can you share the output of the watch stats api:

curl 'localhost:9200/_watcher/stats?_all&pretty&emit_stacktraces'

If the problem is still occurring we can see what watch is potentially causing this problem.

Martijn

sumithub · July 9, 2015, 11:07pm

Thanks Martijn for quick reply.

Pls find inline answers to your questions below

How many watches have configured? Total 7 watches
How often are these Watches configured to trigger? One about 15s(the one which threw error initially see below) and remaining 6 watches 1m
What is the nature of your watches? For example are watches configured to send emails? Yes, all watchers are configured to send mail
Can you share the output of the watch info api:

GET /_watcher?pretty
{
"version": {
"name": "1.0.0-Beta1",
"number": "1.0.0.Beta1",
"build_hash": "b4bc48a200d93296b4731641f6dbdf81feb46cbf",
"build_timestamp": "2015-05-20T12:29:28Z",
"build_snapshot": false
}
}

Can you share the output of the watch stats api:

curl 'localhost:9200/_watcher/stats?all&pretty&emitstacktraces'

{
"watcher_state": "started",
"watch_count": 7,
"execution_queue": {
"size": 1000,
"max_size": 40
}
}

Below is the watch started giving the problem initially, got the error through watch history index. After this we could see stae failures of all other watches

{
"_index": ".watch_history-2015.07.01",
"_type": "watch_record",
"_id": "82b8ab81-a5e8-4ebd-ba89-32f91ea79a41_419-2015-07-01T01:18:55.196Z",
"_score": null,
"_source": {
"watch_id": "82b8ab81-a5e8-4ebd-ba89-32f91ea79a41",
"trigger_event": {
"schedule": {
"triggered_time": "2015-07-01T01:18:55.196Z",
"scheduled_time": "2015-07-01T01:18:55.139Z"
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"workflow_test_connectors"
],
"types": [
"processingworkflowbatchrequests"
],
"body": {
"aggs": {
"status_count": {
"filter": {
"term": {
"workflowRequestId": "82b8ab81"
}
},
"aggs": {
"status_num": {
"terms": {
"field": "status"
}
}
}
}
}
}
}
}
},
"condition": {
"always": {}
},
"state": "failed",
"message": "failed to run triggered watch [82b8ab81-a5e8-4ebd-ba89-32f91ea79a41] due to thread pool capacity"
},
"fields": {
"trigger_event.schedule.triggered_time": [
1435713535196
],
"execution_result.execution_time": [
null
],
"trigger_event.schedule.scheduled_time": [
1435713535139
]
},
"sort": [
1435713535196
]
}

Thanks,
Sumit

mvg · July 10, 2015, 8:24am

Unfortunately the watch stats api doesn't give the output I hoped to see. Can you upgrade to use the the latest Watcher release? (1.0) The watch stats api has been updated between this beta1 and the 1.0 release to include more information.

If you upgrade be sure to follow the upgrade instructions (deleting existing watches etc.) otherwise Watcher can't start properly.

I think a watch executions are stuck and this blocks other watch executions. If you upgrade to 1.0 the watch stats api can confirm this. Also the 1.0 release is a bit more aggressive to the operations that hang for a long time / forever. (for example search input that takes to long to execute will be timed out)

mattgolding · July 10, 2015, 9:57am

Hi @mvg I work with @sumithub and thought I would try and get you a response before the weekend.

In regards to upgrading I am sure that will be fine however there is one watch that can't be deleted from one of the nodes. Is there a way to force delete a watch or something? I don't have the error message we get when trying to delete that watch but we can add it Monday to the thread if you need to see it.

mvg · July 10, 2015, 10:21am

You can force a watch to be deleted that is currently being used by using the force option:
DELETE _watcher/watch/my-watch?force

Alternatively you can also stop the all nodes, uninstall watcher on all nodes, start the nodes, delete all watcher related indices.

sumithub · July 14, 2015, 11:14pm

Hi Martijn,

After upgrading to 1.0.0 version and deleting all ole watcher indices, thread pool error is gone.

Now we're faced with another exception for the below watch. The error is getting repeated every minute.

Error:

failed to execute action [node_event/send_email]
org.elasticsearch.common.mustache.MustacheException: Index: 0, Size: 0@[query-template:1]
at org.elasticsearch.common.mustache.codes.DefaultCode.get(DefaultCode.java:107)
at org.elasticsearch.common.mustache.codes.ValueCode.execute(ValueCode.java:55)
at org.elasticsearch.common.mustache.codes.DefaultMustache.run(DefaultMustache.java:30)
at org.elasticsearch.common.mustache.codes.DefaultCode.execute(DefaultCode.java:126)
at org.elasticsearch.common.mustache.codes.DefaultCode.execute(DefaultCode.java:115)
at org.elasticsearch.watcher.support.template.xmustache.XMustacheScriptEngineService$MustacheExecutableScript.run(XMustacheScriptEngineService.java:183)
at org.elasticsearch.watcher.support.template.xmustache.XMustacheTemplateEngine.render(XMustacheTemplateEngine.java:51)
at org.elasticsearch.watcher.actions.email.service.EmailTemplate.render(EmailTemplate.java:121)
at org.elasticsearch.watcher.actions.email.ExecutableEmailAction.execute(ExecutableEmailAction.java:60)
at org.elasticsearch.watcher.actions.ActionWrapper.execute(ActionWrapper.java:104)
at org.elasticsearch.watcher.execution.ExecutionService.executeInner(ExecutionService.java:379)
at org.elasticsearch.watcher.execution.ExecutionService.execute(ExecutionService.java:271)
at org.elasticsearch.watcher.execution.ExecutionService$WatchExecutionTask.run(ExecutionService.java:417)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at org.elasticsearch.common.collect.Iterables.get(Iterables.java:728)
at org.elasticsearch.watcher.support.template.xmustache.XMustacheFactory$CollectionMap.get(XMustacheFactory.java:137)
at org.elasticsearch.watcher.support.template.xmustache.XMustacheFactory$CollectionMap.containsKey(XMustacheFactory.java:146)
at org.elasticsearch.common.mustache.reflect.ReflectionObjectHandler.findWrapper(ReflectionObjectHandler.java:125)
at org.elasticsearch.common.mustache.reflect.ReflectionObjectHandler.find(ReflectionObjectHandler.java:72)
at org.elasticsearch.common.mustache.reflect.GuardedBinding.getWrapper(GuardedBinding.java:94)
at org.elasticsearch.common.mustache.reflect.GuardedBinding.createAndGet(GuardedBinding.java:79)
at org.elasticsearch.common.mustache.reflect.GuardedBinding.get(GuardedBinding.java:74)
at org.elasticsearch.common.mustache.codes.DefaultCode.get(DefaultCode.java:105)
... 15 more

Output from stats
GET /_watcher/stats?pretty
{
"watcher_state": "started",
"watch_count": 6,
"execution_thread_pool": {
"queue_size": 0,
"max_size": 40
}
}

PUT _watcher/watch/node_event/execute/_ack

{
"_status": {
"last_checked": "2015-07-14T23:06:43.406Z",
"last_met_condition": "2015-07-14T23:06:43.406Z",
"actions": {
"send_email": {
"ack": {
"timestamp": "2015-07-14T23:05:43.039Z",
"state": "awaits_successful_execution"
},
"last_execution": {
"timestamp": "2015-07-14T23:06:43.406Z",
"successful": false,
"reason": "MustacheException[Index: 0, Size: 0@[query-template:1]]; nested: IndexOutOfBoundsException[Index: 0, Size: 0]; "
}
}
}
}
}

watch from elasticsearch usecase example

PUT _watcher/watch/node_event
{
"trigger": {
"schedule": {
"interval": "1m"
}
},
"input": {
"search": {
"request": {
"indices": [
".marvel-*"
],
"search_type": "query_then_fetch",
"body": {
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"match": {
"event": "node_left"
}
},
{
"match": {
"event": "node_joined"
}
}
]
}
},
"filter": {
"range": {
"@timestamp": {
"from": "{{ctx.trigger.scheduled_time}}||-60s",
"to": "{{ctx.trigger.triggered_time}}"
}
}
}
}
},
"fields": [
"event",
"message",
"cluster_name"
],
"sort": [
{
"@timestamp": {
"order": "desc"
}
}
]
}
}
}
},
"throttle_period": "1m",
"condition": {
"script": {
"inline": "ctx.payload.hits.size() > 0 "
}
},
"actions": {
"send_email": {
"email": {
"to": "team@abc.com",
"subject": "{{ctx.payload.hits.hits.0.fields.event}} the cluster",
"body": "DEMO CLUSTER {{ctx.payload.hits.hits.0.fields.message}} the cluster {{ctx.payload.hits.hits.0.fields.cluster_name}} "
}
}
}
}

Hope above information will help to debug the issue.

Thanks,
Sumit

mvg · July 15, 2015, 10:54am

This error is confusing and no IndexOutOfBoundsException should ever bubble up. So that should be fixed.

Are you sure that the following expressions yield results?

ctx.payload.hits.hits.0.fields.event
ctx.payload.hits.hits.0.fields.message
ctx.payload.hits.hits.0.fields.cluster_name

Also I think you need to use _source instead of fields. Unless you tell the _search api to return stored fields specifically then fields can be used, otherwise _source should be used, which is included by default.

mvg · July 15, 2015, 11:16am

If you change you condition script to: ctx.payload.hits.total > 0
then I expect no history entries with errors.

Your current condition also evaluates positive when there are no hits. This allows the action to be executed, but the templates can't resolve their values. This cause a IndexOutOfBoundsException. The IndexOutOfBoundsException should be fixed in Watcher and a better error should be returned.

sumithub · July 16, 2015, 6:15am

Thanks Martin, the issue is now resolved after applying the changes suggested by you.

owulff · September 23, 2015, 7:18am

I had the same problem with Watcher 1.0.1:
failed to execute action [node_event/send_email]
org.elasticsearch.common.mustache.MustacheException: Index: 0, Size: 0@[query-template:1]
at org.elasticsearch.common.mustache.codes.DefaultCode.get(DefaultCode.java:107)
at org.elasticsearch.common.mustache.codes.ValueCode.execute(ValueCode.java:55)
at org.elasticsearch.common.mustache.codes.DefaultMustache.run(DefaultMustache.java:30)

After I fixed "ctx.payload.hits.total > 0", the exception is gone.

Can you update this in the docs:
https://www.elastic.co/guide/en/watcher/current/watching-marvel-data.html#watching-nodes

Topic		Replies	Views
Watcher on Elastic Cloud - failed to run triggered watch due to thread pool capacity Elasticsearch elastic-stack-alerting	8	2694	February 18, 2017
Watcher Troubleshooting Elasticsearch elastic-stack-alerting	5	1363	October 22, 2020
Watchers Stopped Triggering Elasticsearch elastic-stack-alerting	9	3932	August 21, 2018
Can't create a new Watch Elasticsearch elastic-stack-alerting	2	1015	July 6, 2017
Clearing queued watches? Elasticsearch elastic-stack-alerting	3	1767	April 27, 2017

Watcher throwing thread pool capacity error

Related topics