Elastic Watcher Alert Failing (Memory & Cpu Usage)

alerting

(PG) #1

I have been attempting to setup a watch using the template that can be found here: https://www.elastic.co/guide/en/watcher/current/watching-marvel-data.html

Current setup: 1 master node with 2 backups; 3 data nodes (3 shards for each); 1 clientnode. Elasticsearch is at 2.1.1, fluentd 2.3.0, watcher plugin latest ver, marvel plugin latest ver, license also installed (no account). This setup is running on CentOs on AWS.

The two alerts I am setting up is High CPU usage and high jvm memory usage, for testing purposes I have set the alert to notify me if they are above 3% with an interval of 10s. Using plugin/head I am able to determine that these are in fact running every 10s, but normally I receive execution_not_needed or failed.

When checking the log under condition I see:
"condition": {
"type": "script",
"status": "failure",
"reason": "GroovyScriptExecutionException[failed to run inline script [if (ctx.payload.aggregations.minutes.buckets.size() == 0) return false; def latest = ctx.payload.aggregations.minutes.buckets[-1]; def node = latest.nodes.buckets[0]; return node && node.memory && node.memory.value >= 3;] using lang [groovy]]; nested: NullPointerException[Cannot get property 'minutes' on null object]; "
},

I have
script.inline: on
script.indexed: on
On all data nodes and master node.

Any help & information is greatly appreciated.


(Alexander Reelsen) #2

Hey,

can you use the execute Watch API and paste the output here?

Thanks!

--Alex


(PG) #3

Hey @spinscale,

After executing the watch, I receive the same error.

{
"_id": "mem_watch_9-2016-02-08T14:58:55.046Z",
"watch_record": {
"watch_id": "mem_watch",
"state": "executed",
"trigger_event": {
"type": "manual",
"triggered_time": "2016-02-08T14:58:55.037Z",
"manual": {
"schedule": {
"scheduled_time": "2016-02-08T14:58:55.045Z"
}
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
".marvel-*"
],
"types": [],
"body": {
"size": 0,
"query": {
"filtered": {
"filter": {
"range": {
"@timestamp": {
"gte": "now-2m",
"lte": "now"
}
}
}
}
},
"aggs": {
"minutes": {
"date_histogram": {
"field": "@timestamp",
"interval": "minute"
},
"aggs": {
"nodes": {
"terms": {
"field": "node.name.raw",
"size": 10,
"order": {
"memory": "desc"
}
},
"aggs": {
"memory": {
"avg": {
"field": "jvm.mem.heap_used_percent"
}
}
}
}
}
}
}
}
}
}
},
"condition": {
"script": "if (ctx.payload.aggregations.minutes.buckets.size() == 0) return false; def latest = ctx.payload.aggregations.minutes.buckets[-1]; def node = latest.nodes.buckets[0]; return node && node.memory && node.memory.value >= 3;"
},
"messages": [],
"result": {
"execution_time": "2016-02-08T14:58:55.046Z",
"execution_duration": 97,
"input": {
"type": "simple",
"status": "success",
"payload": {
"foo": "bar"
}
},
"condition": {
"type": "always",
"status": "success",
"met": true
},
"actions": [
{
"id": "send_email",
"type": "email",
"status": "failure",
"transform": {
"type": "script",
"status": "failure",
"reason": "GroovyScriptExecutionException[failed to run inline script [def latest = ctx.payload.aggregations.minutes.buckets[-1]; return latest.nodes.buckets.findAll { return it.memory && it.memory.value >= 3 };] using lang [groovy]]; nested: NullPointerException[Cannot get property 'minutes' on null object]; "
},
"reason": "Failed to transform payload"
}
]
}
}
}

Regards,
Petro


(Alexander Reelsen) #4

Hey,

tested locally. You dont have any marvel data to check against (thats how I get this error reproduced). What happens here is, that the watch expects the aggregations data structure to be there, what only happens, if data has been indexed.

Have you installed the marvel-agent and is it indexing into your local cluster?

--Alex


(PG) #5

Hey Alex,

Yep, I have marvel-agent running on the cluster (double-checked). Using _plugin/head/browser I can see the marvel files being generated...

The only thing I can think of is that i have not re-indexed the files manually.


(Alexander Reelsen) #6

Hey,

something is wrong with your watch, it does not execute a search query. Check the result section of your pasted response, it shows a simple input...

--Alex


(Iqbal Nazir) #7

Hi Alex,
I am also having similar type of issue with watcher. I don't receive any email for cpu and memory usage. I know my email configuration in elasticsearch.yml is correct because I receive email for another watch. I have followed https://www.elastic.co/guide/en/watcher/current/watching-marvel-data.html#watching-cpu-usage and set the cpu usage to 5% just to check if I receive any email. After reading this post I have checked POST _watcher/watch/cpu_usage/_execute which shows me output like this...
{
"_id": "cpu_usage_168-2016-06-09T09:44:12.366Z",
"watch_record": {
"watch_id": "cpu_usage",
"state": "execution_not_needed",
"trigger_event": {
"type": "manual",
"triggered_time": "2016-06-09T09:44:12.366Z",
"manual": {
"schedule": {
"scheduled_time": "2016-06-09T09:44:12.366Z"
...
....
...
I have checked in marvel that my node is consuming more than 10% cpu all the time. Still I don't receive any email. Do you have any solution for me? (I'm a beginner in Elasticsearch and everything..so detailed answer would be really appreciated)
thanks in advance.
--Iqbal


(Alexander Reelsen) #8

hey,

please open a new thread and include the output of calling the Execute Watch Api.

--Alex


(system) #9