I would like to monitor my errors on elasticsearch.
I would like to get a notification if a certain error occurred more than a certain number of times (lets say 2 times) in a time period of one hour.
For example if these are my error log in the last 1 hour:
{msg: "storage_failed", level: "error", name: "jim"}
{msg: "connection_closed", level: "error", name: "jack"}
{msg: "error_occurred", level: "error", name: "jay"}
{msg: "storage_failed", level: "error", name: "sam"}
{msg: "connection_closed", level: "error", name: "jack"}
{msg: "connection_closed", level: "error", name: "tom"}
I would get 2 email notifications
1) error: connection_closed 3 times
2) error: storage_failed 3 times
if I received a notification for certain error, notification on that error should be quited for 1 hour (using throttle_period).
in the example above:
notification on storage_failed and connection_closed will be quited,
but if other error received - notification will be alerted
note: my error message are dynamic, I do not know them in advance
here is what i tried:
curl -XPUT 'https://elastic-instance:9243/_xpack/watcher/watch/log_error_watch?pretty' -H 'Content-Type: application/json' -d'
{
  "trigger" : {"schedule" : { "interval" : "1m" }},
  "input" : {
    "search" : {
      "request" : {
        "indices" : [ "logs" ],
        "body" : {
          "query": {
            "bool": {
              "must": [
                { "match_phrase": { "level": "error" } },
                {"range" : {"timestamp" : {"gte": "now-1h", "lte": "now"}}}
              ]
            }
          },
          "aggs": {
            "error_msg": {
              "terms": {
                "field": "msg.keyword"
              }
            }
          }
        }
      }
    }
  },
  "condition" : {
    "compare" : { "ctx.payload.aggregations.error_msg.buckets.0.doc_count" : { "gt" : 2 }}
  },
  "actions" : {
    "email_administrator" : {
      "throttle_period": "2h",
      "email" : {
        "to" : "example@gmail.com",
        "subject" : "Encountered {{ctx.payload.aggregations.error_msg.buckets.0.doc_count}} errors",
        "body" : "Too many error in the system, see attached data",
        "attachments" : {
          "attached_data" : {
            "data" : {
              "format" : "json"
            }
          }
        },
        "priority" : "high"
      }
    }
  }
}
'
this is the notification I get:
{
  "ctx" : {
    "metadata" : null,
    "watch_id" : "log_error_watch",
    "payload" : {
      "_shards" : {
        "total" : 5,
        "failed" : 0,
        "successful" : 5
      },
      "hits" : {
        "hits" : [
          {
            "_index" : "logs",
            "_type" : "event",
            "_source" : {
              "request" : "GET index.html",
              "status_code" : 404,
              "level" : "error",
              "message" : "ppppp",
              "timestamp" : "2017-07-31T12:05:22.119Z"
            },
            "_id" : "AV2YicoWSIeOW7mgwgRM",
            "_score" : 1.0870113
          },
          {
            "_index" : "logs",
            "_type" : "event",
            "_source" : {
              "request" : "GET index.html",
              "status_code" : 404,
              "level" : "error",
              "message" : "ooooooooo",
              "timestamp" : "2017-07-31T12:05:22.119Z"
            },
            "_id" : "AV2YifZ1SIeOW7mgwgRR",
            "_score" : 1.0870113
          },
          ...
        ],
        "total" : 4,
        "max_score" : 1.1823215
      },
      "took" : 1,
      "timed_out" : false,
      "aggregations" : {
        "error_msg" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "doc_count" : 2,
              "key" : "ooooooooo"
            },
            {
              "doc_count" : 2,
              "key" : "ppppp"
            }
          ]
        }
      }
    },
    "id" : "log_error_watch_6fd76d9d-05bc-4e75-962e-26f86259b88f-2017-07-31T12:10:02.895Z",
    "trigger" : {
      "triggered_time" : "2017-07-31T12:10:02.895Z",
      "scheduled_time" : "2017-07-31T12:10:02.895Z"
    },
    "vars" : { },
    "execution_time" : "2017-07-31T12:10:02.895Z"
  }
}
now this how do I iterate over all buckets of aggregation - and send notification for each one which doc_count is greater than 2?
and how do I set the throttle_period for the certain error log?