A watch alert example based on two different searches using CHAIN input and Painless script condition

alerting

(Joseph Dissmeyer) #1

Good afternoon! I wanted to share this solution with the community.

This is an example of a watch where you need to run two searches against the same data index looking for two different strings (such as "ERROR" and "TIMEOUT"). Then in the condition you need to trigger a notification alert only if the first search returns results and the second one does not.

Scenario:
You harvest application logs for a critical business application.
Inside of these logs there are entries identifying payment processing connection occurrences, denials, approvals, as well as connection timeouts.
Each log entry is a single line.
You need to create an alert to look for the last 10 minutes of log data for timeout errors as well as missing payment approvals within the same time.
Creating an alert for timeouts is simple enough. But in this case, it is possible that there were only a few seconds where timeouts were seen and before and after the timeouts payment processing appears to be normal, hence an alert isn't needed. In this scenario you want to avoid false alerts especially if the issue seemed to have resolved itself and a full outage did not occur.

In order to build this watch, we will use the CHAIN input to run the two individual searches, then we will use a simple painless script as logic for the condition.

Here is the example:

{
  "trigger": {
    "schedule": {
      "interval": "10m"
    }
  },
  "input": {
    "chain": {
      "inputs": [
        {
          "first": {
            "search": {
              "request": {
                "search_type": "query_then_fetch",
                "indices": [
                  "your_appl_logs-*"
                ],
                "types": [],
                "body": {
                  "query": {
                    "bool": {
                      "must": [
                        {
                          "query_string": {
                            "query": "\"statusCode\\\":timeout\""
                          }
                        },
                        {
                          "range": {
                            "@timestamp": {
                              "gte": "now-10m"
                            }
                          }
                        }
                      ]
                    }
                  },
                  "_source": [
                    "message"
                  ]
                }
              }
            }
          }
		},
		{
          "second": {
            "search": {
              "request": {
                "search_type": "query_then_fetch",
                "indices": [
                  "application_logs-*"
                ],
                "types": [],
                "body": {
                  "query": {
                    "bool": {
                      "must": [
                        {
                          "query_string": {
							  "query": "\"statusCode\\\":approved\""
                          }
                        },
                        {
                          "range": {
                            "@timestamp": {
                              "gte": "now-10m"
                            }
                          }
                        }
                      ]
                    }
                  },
                  "_source": [
                    "message"
                  ]
                }
              }
            }
          }
        }
      ]
    }
  },
  "condition": {
    "script": {
      "source": "return ctx.payload.first.hits.total > 0 && ctx.payload.second.hits.total == 0",
      "lang": "painless"
    }
  },
  "actions": {
    "email_users": {
      "email": {
        "profile": "standard",
        "attachments": {
          "copy_of_search_results.txt": {
            "data": {
              "format": "json"
            }
          }
        },
        "priority": "high",
        "to": [
          "support@piedpiper.com"
        ],
        "subject": "ELASTIC STACK ALERT: Payment processing issues in Application!",
        "body": {
          "html": "<b>--Alerts Notification Details--</b><br>This alert triggered because a total of <b>{{ctx.payload.first.hits.total}}</b> timeout logs and <b>{{ctx.payload.second.hits.total}}</b> payment approvals were found in the application within the last ten minutes!<br><br><b>ALERT NAME:</b> {{ctx.watch_id}}<br><b>Link to Kibana Dashboard:</b> https://your.secure.link.here"
        }
      }
    }
  },
  "throttle_period": "1h"
}

As you can see this is a very simple rule to achieve a better alert. The concepts should be easy to understand as well.

So how else could this alert example be adapted?
One way to adapt this could have to do with needing to alert based on a specfic level of errors in different indices.
Let's say that there is an error type in an application 'A' log, then another error type in application 'B' that correlates to first error found in application 'A'. And each application log has it's own data index. You could adapt this example to search each individual index for the correlation then alert based on the condition for each result!

I hope this helps others in the community. Happy New Year!

  • Joey D

(Alexander Reelsen) #2

Hey,

Thanks so much for investing the time and sharing your watches. Much appreciated!

I dont know about your app/requirements, but what if you only executed one query filtering on the timestamp of the last 10 minutes and then have a terms aggregation on the status code field? Then you would get the counts of approved and timeout in one single query, without even having to look at the counts.

Hope this helps!

--Alex


(Joseph Dissmeyer) #3

@spinscale Thank you Alexander! It is my pleasure to share. I know that if I share some of my own examples it will encourage others to share theirs as well, further growing the collective knowledge of the community.

Regarding your question... "have a terms aggregation on the status code field?" Yes! This is a possible solution but only if "statusCode" is a field in the index. So if statusCode is it's own field in the index it would be a MUCH better solution and would most likely cause much less stress, use less resources, and the search would be much faster.

Sadly in my case, and probably many others, it is not. Most of the application logs that I work with are traditional and unstructured. Each log starts with a time and date but then the rest of the log line is pure 'data' completely unstructured. So unfortunately I just have to work with that for the time being.

**One possibility is to use logstash to enrich the logs by groking the statusCode into it's own field to simplify things. I just might look into that. :slight_smile:


(Alexander Reelsen) #4

Hey,

you may not need to use logstash, as Elasticsearch has such a feature as well for simple cases (i.e. just using grok). It is called ingest and you can read about it here

--Alex


(Joseph Dissmeyer) #5

Ah yes I forgot about that! Good point. Yes ingest can be used as well.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.