I would like to monitor my errors on elasticsearch.
I would like to get a notification if a certain error occurred more than a certain number of times (lets say 2 times) in a time period of one hour.
For example if these are my error log in the last 1 hour:
{msg: "storage_failed", level: "error", name: "jim"}
{msg: "connection_closed", level: "error", name: "jack"}
{msg: "error_occurred", level: "error", name: "jay"}
{msg: "storage_failed", level: "error", name: "sam"}
{msg: "connection_closed", level: "error", name: "jack"}
{msg: "connection_closed", level: "error", name: "tom"}
I would get 2 email notifications
1) error: connection_closed 3 times
2) error: storage_failed 3 times
if I received a notification for certain error, notification on that error should be quited for 1 hour (using throttle_period
).
in the example above:
notification on storage_failed and connection_closed will be quited,
but if other error received - notification will be alerted
note: my error message are dynamic, I do not know them in advance
here is what i tried:
curl -XPUT 'https://elastic-instance:9243/_xpack/watcher/watch/log_error_watch?pretty' -H 'Content-Type: application/json' -d'
{
"trigger" : {"schedule" : { "interval" : "1m" }},
"input" : {
"search" : {
"request" : {
"indices" : [ "logs" ],
"body" : {
"query": {
"bool": {
"must": [
{ "match_phrase": { "level": "error" } },
{"range" : {"timestamp" : {"gte": "now-1h", "lte": "now"}}}
]
}
},
"aggs": {
"error_msg": {
"terms": {
"field": "msg.keyword"
}
}
}
}
}
}
},
"condition" : {
"compare" : { "ctx.payload.aggregations.error_msg.buckets.0.doc_count" : { "gt" : 2 }}
},
"actions" : {
"email_administrator" : {
"throttle_period": "2h",
"email" : {
"to" : "example@gmail.com",
"subject" : "Encountered {{ctx.payload.aggregations.error_msg.buckets.0.doc_count}} errors",
"body" : "Too many error in the system, see attached data",
"attachments" : {
"attached_data" : {
"data" : {
"format" : "json"
}
}
},
"priority" : "high"
}
}
}
}
'
this is the notification I get:
{
"ctx" : {
"metadata" : null,
"watch_id" : "log_error_watch",
"payload" : {
"_shards" : {
"total" : 5,
"failed" : 0,
"successful" : 5
},
"hits" : {
"hits" : [
{
"_index" : "logs",
"_type" : "event",
"_source" : {
"request" : "GET index.html",
"status_code" : 404,
"level" : "error",
"message" : "ppppp",
"timestamp" : "2017-07-31T12:05:22.119Z"
},
"_id" : "AV2YicoWSIeOW7mgwgRM",
"_score" : 1.0870113
},
{
"_index" : "logs",
"_type" : "event",
"_source" : {
"request" : "GET index.html",
"status_code" : 404,
"level" : "error",
"message" : "ooooooooo",
"timestamp" : "2017-07-31T12:05:22.119Z"
},
"_id" : "AV2YifZ1SIeOW7mgwgRR",
"_score" : 1.0870113
},
...
],
"total" : 4,
"max_score" : 1.1823215
},
"took" : 1,
"timed_out" : false,
"aggregations" : {
"error_msg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"doc_count" : 2,
"key" : "ooooooooo"
},
{
"doc_count" : 2,
"key" : "ppppp"
}
]
}
}
},
"id" : "log_error_watch_6fd76d9d-05bc-4e75-962e-26f86259b88f-2017-07-31T12:10:02.895Z",
"trigger" : {
"triggered_time" : "2017-07-31T12:10:02.895Z",
"scheduled_time" : "2017-07-31T12:10:02.895Z"
},
"vars" : { },
"execution_time" : "2017-07-31T12:10:02.895Z"
}
}
now this how do I iterate over all buckets of aggregation - and send notification for each one which doc_count is greater than 2?
and how do I set the throttle_period for the certain error log?