Good afternoon! I wanted to share this solution with the community.
This is an example of a watch where you need to run two searches against the same data index looking for two different strings (such as "ERROR" and "TIMEOUT"). Then in the condition you need to trigger a notification alert only if the first search returns results and the second one does not.
Scenario:
You harvest application logs for a critical business application.
Inside of these logs there are entries identifying payment processing connection occurrences, denials, approvals, as well as connection timeouts.
Each log entry is a single line.
You need to create an alert to look for the last 10 minutes of log data for timeout errors as well as missing payment approvals within the same time.
Creating an alert for timeouts is simple enough. But in this case, it is possible that there were only a few seconds where timeouts were seen and before and after the timeouts payment processing appears to be normal, hence an alert isn't needed. In this scenario you want to avoid false alerts especially if the issue seemed to have resolved itself and a full outage did not occur.
In order to build this watch, we will use the CHAIN input to run the two individual searches, then we will use a simple painless script as logic for the condition.
Here is the example:
{
"trigger": {
"schedule": {
"interval": "10m"
}
},
"input": {
"chain": {
"inputs": [
{
"first": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"your_appl_logs-*"
],
"types": [],
"body": {
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "\"statusCode\\\":timeout\""
}
},
{
"range": {
"@timestamp": {
"gte": "now-10m"
}
}
}
]
}
},
"_source": [
"message"
]
}
}
}
}
},
{
"second": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"application_logs-*"
],
"types": [],
"body": {
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "\"statusCode\\\":approved\""
}
},
{
"range": {
"@timestamp": {
"gte": "now-10m"
}
}
}
]
}
},
"_source": [
"message"
]
}
}
}
}
}
]
}
},
"condition": {
"script": {
"source": "return ctx.payload.first.hits.total > 0 && ctx.payload.second.hits.total == 0",
"lang": "painless"
}
},
"actions": {
"email_users": {
"email": {
"profile": "standard",
"attachments": {
"copy_of_search_results.txt": {
"data": {
"format": "json"
}
}
},
"priority": "high",
"to": [
"support@piedpiper.com"
],
"subject": "ELASTIC STACK ALERT: Payment processing issues in Application!",
"body": {
"html": "<b>--Alerts Notification Details--</b><br>This alert triggered because a total of <b>{{ctx.payload.first.hits.total}}</b> timeout logs and <b>{{ctx.payload.second.hits.total}}</b> payment approvals were found in the application within the last ten minutes!<br><br><b>ALERT NAME:</b> {{ctx.watch_id}}<br><b>Link to Kibana Dashboard:</b> https://your.secure.link.here"
}
}
}
},
"throttle_period": "1h"
}
As you can see this is a very simple rule to achieve a better alert. The concepts should be easy to understand as well.
So how else could this alert example be adapted?
One way to adapt this could have to do with needing to alert based on a specfic level of errors in different indices.
Let's say that there is an error type in an application 'A' log, then another error type in application 'B' that correlates to first error found in application 'A'. And each application log has it's own data index. You could adapt this example to search each individual index for the correlation then alert based on the condition for each result!
I hope this helps others in the community. Happy New Year!
- Joey D