By host Watch strategy

alerting

#1

Hi,

I'm trying to setup a few watches that check for more than X 5xx apache errors on my servers during the last y period, based on logstash inputs. I'm trying to find the best strategy to be able to do the check by host instead of for all host.
My goal is to avoid having to create a watch by node and to be able to receive an email alert, split by host, with the error code and the problematic request.

What should I use to facilitate my life ? Faceted search ? Aggregation ? How can I use that to write my condition ? How can I do the split in the body of my email?

How are you handling that kind of problematic?

Thanks in advance for your insight.

Maxime

Here is my current watch.

PUT _watcher/watch/apache_500_error
{
  "trigger": {
    "schedule": {
      "interval": "1m"
    }
  },
  "input": {
    "search": {
      "request": {
        "indices": [
          "logstash-apache-*"
        ],
        "body": {
          "query": {
            "filtered": {
              "query": {
                "match_all": {}
              },
              "filter": {
                "and": [
                  {
                    "numeric_range": {
                      "response": {
                        "gte": 500,
                        "lt": 600
                      }
                    }
                  },
                  {
                    "range": {
                      "@timestamp": {
                        "gte": "now-1h",
                        "lte": "now"
                      }
                    }
                  }
                ]
              }
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "gt": 20
      }
    }
  },
  "throttle_period": "1h",
  "actions": {
    "send_email": {
      "email": {
        "from": "from@mydomain.com",
        "to": [
          "to@mydomain.com"
        ],
        "subject": "{{ctx.payload.hits.total}} Errors on apache during the last hour",
        "body": {
          "html": "Error are on these urls : <br/> <table><tr><td>timestamp</td><td>error code</td><td>request</td></tr>{{#ctx.payload.hits.hits}}<tr><td>{{_source.timestamp}}</td><td>{{_source.response}}</td><td>{{_source.request}}</td></tr>{{/ctx.payload.hits.hits}}</table>"
        },
        "attach_data": true
      }
    }
  }
}

(Steve Kearns) #2

Hi Maxime,

This is a great use-case for Watcher. I had trouble understanding exactly what you wanted in the email. Can you provide an example of exactly what information you want included in the email notification? That way, we're more likely to provide a useful example :slight_smile:

Thanks!


#3

Hi Steve,

Thanks for your answer.
The ideal mail for me would look like :

Server1

received 5 500 errors on : /url1
received 10 500 errors on : /url2

Server4

received 15 500 errors on : /url1

Server 7

received 10 500 errors on : /url3

The server name, error code and url are part of my indexed logs. And I only want to see the servers where the number of 5xx errors during the last time frame is > 10.
The grouping by url would be a nice to have. I don't mind having duplicate inside a server block

Thanks,

Maxime


#4

Hi again,

I kept working on that and decided to use aggregations

Here is the kind of email I'll be receiving

Error are on these urls : 

server1 : 30
count	                                          url
10 (6 error 502, 4 error 500, )	                 /url1
15 (10 error 502, 2 error 503, 4 error 500, )	 /url2 
5 (5 error 502, )                                /url3 

server2 : 20
count	                         url
10 (6 error 502, 4 error 500, )	/url4
10 (10 error 502 )              /url2 

And here is the watch I'm using

PUT _watcher/watch/apache_500_error_by_server
{
  "trigger": {
    "schedule": {
      "interval": "1m"
    }
  },
  "input": {
    "search": {
      "request": {
        "indices": [
          "logstash-apache-*"
        ],
        "body": {
          "query": {
            "filtered": {
              "query": {
                "match_all": {}
              },
              "filter": {
                "and": [
                  {
                    "numeric_range": {
                      "response": {
                        "gte": 500,
                        "lt": 600
                      }
                    }
                  },
                  {
                    "range": {
                      "@timestamp": {
                        "gte": "now-1h",
                        "lte": "now"
                      }
                    }
                  }
                ]
              }
            }
          },
          "aggs": {
            "hosts": {
              "terms": {
                "field": "host.raw",
                "size": 10
              },
              "aggs": {
                "request": {
                  "terms": {
                    "field": "request.raw",
                    "size": 10
                  },
                  "aggs": {
                    "responses": {
                      "terms": {
                        "field": "response",
                        "size": 10
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  },
  "condition": {
    "script": {
      "inline": "if (ctx.payload.hits.total < 1) return false; for(term in ctx.payload.aggregations.hosts.buckets){ if(term.doc_count > 10){return true; }}"
    }
  },
  "throttle_period": "1h",
  "actions": {
    "send_email": {
      "email": {
        "from": "from@myhost.com",
        "to": [
          "maximed@myhost.com"
        ],
        "subject": "{{ctx.payload.hits.total}} Errors on apache during the last hour",
        "body": {
          "html": "Error are on these urls : <br/>{{#ctx.payload.aggregations.hosts.buckets}} {{key}} : {{doc_count}} <table> <tr> <td>count</td> <td>url</td> </tr> {{#request.buckets}} <tr> <td>{{doc_count}} ({{#responses.buckets}}{{doc_count}} error {{key}}, {{/responses.buckets}})</td> <td>{{key}}</td> </tr> {{/request.buckets}} </table> <br/><br/> {{/ctx.payload.aggregations.hosts.buckets}} <br/>"
        },
        "attach_data": true
      }
    }
  }
}

(system) #5