I am working on an observability solution based on ELK. The monitoring data that are to harvest are as below.
- Application log data (only exceptions / errors)
- System metrics from multiple cloud compute and network devices
- Packetbeat metrics
- Uptime Information using Heartbeat.
The proposed solution will use Elastic Watcher to raise tickets in Jira based on the incidents identified from the aforementioned data sources. But, this may lead to multiple incidents being raised as there is no correlation between various monitoring data. I would like to correlate the events from filebeat, metricbeat, heartbeat, packetbeat and find out the root cause of an incident before raising a ticket in Jira.
Ex: There is a Spring boot REST service runs on Server A and it consumes a MongoDB service running on Server B. We are ingesting the Spring boot application logs and MongoDB logs in filebeat index. Also, there is a metricbeat that monitors Server A and Server B system metrics. There is a heartbeat that monitors the uptime for Spring boot service and the MongoDB service. At time t0 the MongoDB service went offline and the Spring Boot service logs started writing exception logs. The Watcher on the filebeat raised an incident. Also the watcher on heartbeat raised another incident. But the OpsTeam do not want two incidents as the root cause was the MongoDB service unavailability. I want to correlate these events and raise a single incident. How do I achieve this in Elasticsearch?
Note: I am aware that the same can be achieved using APM. But here I am looking for a solution that monitors some legacy application where attaching a apm agent or instrumenting the services using a tracing framework is not possible.