Not receiving email for CPU usage

iqbal_nazir · June 9, 2016, 10:43am

I am with watcher. I don't receive any email for cpu and memory usage. I know my email configuration in
elasticsearch.yml is correct because I receive email for another watch (i.e. event_critical_watch). I have followed https://www.elastic.co/guide/en/watcher/current/watching-marvel-data.html#watching-cpu-usage and set the cpu usage to 5% just to check if I receive any email. After reading another post like this in this forum, I have checked POST _watcher/watch/cpu_usage/_execute which shows me output like this...(in the 1st comment)

I have checked in marvel that my node is consuming more than 10% cpu all
the time. Still I don't receive any email. Does any one have any solution for
me?

Just FYI: I have checked in kibana .marvel-* index but there I didn't find any "os.cpu.user" field as mentioned in the guide. Is it the possible reason? If yes, why that field is not there?

(I'm a beginner in Elasticsearch and everything..so detailed answer
would be really appreciated)
thanks in advance.
--Iqbal

iqbal_nazir · June 9, 2016, 10:45am

{
"_id": "cpu_usage_220-2016-06-09T10:35:15.687Z",
"watch_record": {
"watch_id": "cpu_usage",
"state": "execution_not_needed",
"trigger_event": {
"type": "manual",
"triggered_time": "2016-06-09T10:35:15.687Z",
"manual": {
"schedule": {
"scheduled_time": "2016-06-09T10:35:15.687Z"
}
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
".marvel-"
],
"types": [],
"body": {
"size": 0,
"query": {
"filtered": {
"filter": {
"range": {
"@timestamp": {
"gte": "now-2m",
"lte": "now"
}
}
}
}
},
"aggs": {
"minutes": {
"date_histogram": {
"field": "@timestamp",
"interval": "minute"
},
"aggs": {
"nodes": {
"terms": {
"field": "node.name.raw",
"size": 10,
"order": {
"cpu": "desc"
}
},
"aggs": {
"cpu": {
"avg": {
"field": "os.cpu.user"
}
}
}
}
}
}
}
}
}
}
},
"condition": {
"script": "if (ctx.payload.aggregations.minutes.buckets.size() == 0) return false; def latest = ctx.payload.aggregations.minutes.buckets[-1]; def node = latest.nodes.buckets[0]; return node && node.cpu && node.cpu.value >= 5;"
},
"messages": [],
"result": {
"execution_time": "2016-06-09T10:35:15.687Z",
"execution_duration": 1,
"input": {
"type": "search",
"status": "success",
"payload": {
"_shards": {
"total": 2,
"failed": 0,
"successful": 2
},
"hits": {
"hits": [],
"total": 0,
"max_score": 0
},
"took": 1,
"timed_out": false,
"aggregations": {
"minutes": {
"buckets": []
}
}
},
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
".marvel-"
],
"types": [],
"template": {
"template": {
"size": 0,
"query": {
"filtered": {
"filter": {
"range": {
"@timestamp": {
"gte": "now-2m",
"lte": "now"
}
}
}
}
},
"aggs": {
"minutes": {
"date_histogram": {
"field": "@timestamp",
"interval": "minute"
},
"aggs": {
"nodes": {
"terms": {
"field": "node.name.raw",
"size": 10,
"order": {
"cpu": "desc"
}
},
"aggs": {
"cpu": {
"avg": {
"field": "os.cpu.user"
}
}
}
}
}
}
}
},
"params": {
"ctx": {
"metadata": null,
"watch_id": "cpu_usage",
"id": "cpu_usage_220-2016-06-09T10:35:15.687Z",
"trigger": {
"triggered_time": "2016-06-09T10:35:15.687Z",
"scheduled_time": "2016-06-09T10:35:15.687Z"
},
"vars": {},
"execution_time": "2016-06-09T10:35:15.687Z"
}
}
}
}
}
},
"condition": {
"type": "script",
"status": "success",
"met": false
},
"actions": []
}
}
}

spinscale · June 9, 2016, 11:25am

Hey

The last five lines tell you the important part:

...
"condition": {
"type": "script",
"status": "success",
"met": false
},
...

This means that the condition returned false. You should go back and evaluate the condition more closely. Maybe you referenced a wrong path somewhere?

--Alex

iqbal_nazir · June 9, 2016, 12:30pm

Hi..
Thanks for the quick reply. My condition field is the same as mentioned in the link (https://www.elastic.co/guide/en/watcher/current/watching-marvel-data.html#watching-cpu-usage). I have just changed 75 to 5 to check if I receive email.

"condition": {
"script": "if (ctx.payload.aggregations.minutes.buckets.size() == 0) return false; def latest = ctx.payload.aggregations.minutes.buckets[-1]; def node = latest.nodes.buckets[0]; return node && node.cpu && node.cpu.value >= 5;"
},

what could be the mistake here? should I adjust anything according to my settings in the elasticsearch. I have only one node called 'My 1st node'
...
Iqbal

spinscale · June 9, 2016, 12:42pm

Hey,

please take your time and examine the result of the execute watch API. Check the result, which contains the search response and find out no hits at all a returned and the buckets array is empty. So either you are querying the wrong index or the index does not exist on your local cluster.

--Alex

spinscale · June 9, 2016, 1:02pm

Hey,

I took a look at the example, and I think it does not reflect the current marvel stats.

Can you replace the two occurences of @timestamp with timestamp.
Can you replace the mention of node.name.raw with node.name
Can you change the heap percent mention from jvm.mem.heap_used_percent to node_stats.jvm.mem.heap_used_percent
Last but not least, can you add "types":["node_stats"] after the indices part to configure the correct query of the type?

Let's see if that changes anything!

--Alex

iqbal_nazir · June 9, 2016, 3:07pm

Hi Alex,

jvm.mem.heap_used_percent is not there in CPU usage, rather in Memory usage example. Still I tried with the memory usage example. Changed the terms as you suggested. But there is no change in the execute result:

        }
      },
      "condition": {
        "type": "script",
        "status": "success",
        "met": false
      },
      "actions": []
    }
  }

spinscale · June 10, 2016, 7:12am

Hey,

can you provide the full watch you are testing with. Also, please put it in appropriate formatting tags, see here how to use use code blocks. This makes it much easier for others.

--Alex

iqbal_nazir · June 10, 2016, 8:43am

Hi,
please find my complete watch below:

  PUT _watcher/watch/mem_watch
{
  "trigger": {
    "schedule": {
      "interval": "1m"
    }
  },
  "input": {
    "search": {
      "request": {
        "indices": [
          ".marvel-*"
        ],
        "types": [
          "node_stats"
        ],
        "body": {
          "size": 0,
          "query": {
            "filtered": {
              "filter": {
                "range": {
                  "timestamp": {
                    "gte": "now-2m",
                    "lte": "now"
                  }
                }
              }
            }
          },
          "aggs": {
            "minutes": {
              "date_histogram": {
                "field": "timestamp",
                "interval": "minute"
              },
              "aggs": {
                "nodes": {
                  "terms": {
                    "field": "node.name",
                    "size": 10,
                    "order": {
                      "memory": "desc"
                    }
                  },
                  "aggs": {
                    "memory": {
                      "avg": {
                        "field": "node_stats.jvm.mem.heap_used_percent"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  },
  "throttle_period": "2m",
  "condition": {
    "script": "if (ctx.payload.aggregations.minutes.buckets.size() == 0) return false; def latest = ctx.payload.aggregations.minutes.buckets[-1]; def node = latest.nodes.buckets[0]; return node && node.memory && node.memory.value >= 5;"
  },
  "actions": {
    "send_email": {
      "transform": {
        "script": "def latest = ctx.payload.aggregations.minutes.buckets[-1]; return latest.nodes.buckets.findAll { return it.memory && it.memory.value >=5 };"
      },
      "email": {
        "to": "user@mycompany.com",
        "subject": "Watcher Notification - HIGH MEMORY USAGE",
        "body": "Nodes with HIGH MEMORY Usage (above 5%):\n\n{{#ctx.payload._value}}\"{{key}}\" - Memory Usage is at {{memory.value}}%\n{{/ctx.payload._value}}"
      }
    }
  }
}

spinscale · June 10, 2016, 3:15pm

Hey,

the field for the terms agg must be source_node.name instead of node.name, my fault.

--Alex

iqbal_nazir · June 13, 2016, 8:05am

Hi Alex,
It has worked Thanks a lot.
Is there any way to edit a watch?
To make a small change, I have to DELETE and PUT again to get "created": true

spinscale · June 13, 2016, 8:09am

Hey,

just put it again, it's fine. The watch will be overwritten.

--Alex

iqbal_nazir · June 13, 2016, 8:25am

Hi,
Thanks again.
Now could you please review my cpu_usage watch? I'm not receiving any email for high cpu usage. Here is my watch for that: 


        PUT _watcher/watch/cpu_usage
        {
          "trigger": {
            "schedule": {
              "interval": "1m"
            }
          },
          "input": {
            "search": {
              "request": {
                "indices": 
                "types":["node_stats"]
                [
                  ".marvel-*"
                ],
                "body": {
                  "size" : 0,
                  "query": {
                    "filtered": {
                      "filter": {
                        "range": {
                          "timestamp": {
                            "gte": "now-2m",
                            "lte": "now"
                          }
                        }
                      }
                    }
                  },
                  "aggs": {
                    "minutes": {
                      "date_histogram": {
                        "field": "timestamp",
                        "interval": "minute"
                      },
                      "aggs": {
                        "nodes": {
                          "terms": {
                            "field": "source_node.name",
                            "size": 10,
                            "order": {
                              "cpu": "desc"
                            }
                          },
                          "aggs": {
                            "cpu": {
                              "avg": {
                                "field": "os.cpu.user"
                              }
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          },
          "throttle_period": "1m", 
          "condition": {
            "script":  "if (ctx.payload.aggregations.minutes.buckets.size() == 0) return false; def latest = ctx.payload.aggregations.minutes.buckets[-1]; def node = latest.nodes.buckets[0]; return node && node.cpu && node.cpu.value >= 5;"
          },
          "actions": {
            "send_email": { 
              "transform": {
                "script": "def latest = ctx.payload.aggregations.minutes.buckets[-1]; return latest.nodes.buckets.findAll { return it.cpu && it.cpu.value >= 5 };"
              },
              "email": {
                "to": "user@mycompany.com", 
                "subject": "Watcher Notification - HIGH CPU USAGE",
                "body": "Nodes with HIGH CPU Usage (above 5%):\n\n{{#ctx.payload._value}}\"{{key}}\" - CPU Usage is at {{cpu.value}}%\n{{/ctx.payload._value}}"
              }
            }
          }
        }

spinscale · June 14, 2016, 7:08am

Hey,

please execute the query standalone first and see if you get back any buckets. If not, execute a search and see where the documents differ.

--Alex

iqbal_nazir · June 14, 2016, 8:34am

Hi Alex,

I have done POST _watcher/watch/cpu_usage/_execute and I think I haven't got any buckets.

   "buckets": [
                      {
                        "doc_count": 4,
                        "cpu": {
                          "value": null

Then, I have also compared between mem_watch and cpu_usage. I have changed from os.cpu.userto node_stats.os.cpu.user with no success so far. I am not an expert and maybe that's why I'm missing something.

spinscale · June 14, 2016, 9:12am

Hey,

this is not what I meant. You should execute the search operation that you refer to in the watch manually as well as a search operation by index and type manually (without specifying any queries) - just do GET /.marvel-/node_stats/_search - this allows you to check out if the fields you are referring to in your query are set in the returned documents.

--Alex

iqbal_nazir · June 14, 2016, 9:54am

Hi,
sorry for my ignorance. I have done GET /.marvel-/node_stats/_search but received 404then I did GET /.marvel-*/node_stats/_search (please note *) which returned a lot of results but it doesn't contain any node_stats.os.cpu.userfield
Thanks.

spinscale · June 15, 2016, 8:13am

Hey,

if you execute

GET /.marvel-es-*/node_stats/_search
{
  "size" : 1,
  "sort" : [ { "timestamp" : "desc" } ]
}

You can see the JSON structure for the last node_stats. The CPU load is now part of the process JSON being returned and only covers the load of this process and not the whole OS, which is currently not being monitored.

I will update the watches in the docs over the next days, but until then you can fix your watch by adapting to the JSON returned.

--Alex

iqbal_nazir · June 15, 2016, 9:22am

Hi again,

Thanks for the clarification. Now could you please tell me how to adjust the watch. After executing your search I have found CPU percent is under process of the last node_stats. Then I have changed the field from node_stats.os.cpu.user to node_stats.process.cpu.percent with no success. Should I have to change anything else in the watch?

spinscale · June 15, 2016, 10:18am

This watch works for me

PUT _watcher/watch/cpu_usage
{
  "trigger": {
    "schedule": {
      "interval": "1m"
    }
  },
  "input": {
    "search": {
      "request": {
        "indices": [
          ".marvel-es-1-*"
        ],
        "types" : [
          "node_stats"
        ],
        "body": {
          "size" : 0,
          "query": {
            "filtered": {
              "filter": {
                "range": {
                  "timestamp": {
                    "gte": "now-2m",
                    "lte": "now"
                  }
                }
              }
            }
          },
          "aggs": {
            "minutes": {
              "date_histogram": {
                "field": "timestamp",
                "interval": "minute"
              },
              "aggs": {
                "nodes": {
                  "terms": {
                    "field": "source_node.name",
                    "size": 10,
                    "order": {
                      "cpu": "desc"
                    }
                  },
                  "aggs": {
                    "cpu": {
                      "avg": {
                        "field": "node_stats.process.cpu.percent"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  },
  "throttle_period": "30m", <1>
  "condition": {
    "script":  "if (ctx.payload.aggregations.minutes.buckets.size() == 0) return false; def latest = ctx.payload.aggregations.minutes.buckets[-1]; def node = latest.nodes.buckets[0]; return node && node.cpu && node.cpu.value >= 75;"
  },
  "actions": {
    "send_email": { <2>
      "transform": {
        "script": "def latest = ctx.payload.aggregations.minutes.buckets[-1]; return latest.nodes.buckets.findAll { return it.cpu && it.cpu.value >= 75 };"
      },
      "email": {
        "to": "user@example.com", <3>
        "subject": "Watcher Notification - HIGH CPU USAGE",
        "body": "Nodes with HIGH CPU Usage (above 75%):\n\n{{#ctx.payload._value}}\"{{key}}\" - CPU Usage is at {{cpu.value}}%\n{{/ctx.payload._value}}"
      }
    }
  }
}

--Alex

Topic		Replies	Views
Unable to print CPU usage values in Watcher email alert Elasticsearch elastic-stack-alerting	1	526	March 4, 2019
Watcher not firing Kibana elastic-stack-alerting	2	412	July 27, 2021
Watcher Configuration to print results in Email Elasticsearch elastic-stack-alerting	13	3481	July 25, 2018
Email not trigger Elasticsearch elastic-stack-alerting	5	906	May 10, 2018
Unable to send email Elasticsearch elastic-stack-alerting	3	3624	July 6, 2017

Not receiving email for CPU usage

Related topics