We have a typical ELK stack deployment in our organization. Logstash pumps our application logs into Elastic Search with each log message being mapped to a document. We use Kibana or the REST interface to Elastic Search to search for logs.
We want to implement an alerting system that alerts us to any errors in logs. The thing about logs or specifically about error in the logs is that there can be multiple occurrences of them. There will be differences in these multiple occurrences of the same error and these differences can be classified into two broad types
- Contextual, use-case related information(organization id, device id, etc)
- Line number(if there was a deployment between occurances), library version in the stack trace of exceptions
The alerting system that I want to build must be able to recognise that multiple occurrences of such errors are the same and thus not send alerts for every occurrence of them.
To this end I have come up with the following high level solution
- Baseline errors in some storage medium
- Query elastic search on a regular basis for errors. Compare each error with the ones in the baseline using a similarity algorithm(TF/IDF) to decide if the error is a new one and an alert must be sent.
Now that I have established the context, I want to know
- Is there any inbuilt support/plugin with either Elastic Search or Logstash to help me with this use-case?
- I am sure people must have already felt the need for such an alerting system. How are such things generally implemented?
Thank you for reading through until here.