Root Cause Identification using filbeat, heartbeat and metribeat data

harimohanr · February 22, 2022, 5:57am

I am working on an observability solution based on ELK. The monitoring data that are to harvest are as below.

Application log data (only exceptions / errors)
System metrics from multiple cloud compute and network devices
Packetbeat metrics
Uptime Information using Heartbeat.

The proposed solution will use Elastic Watcher to raise tickets in Jira based on the incidents identified from the aforementioned data sources. But, this may lead to multiple incidents being raised as there is no correlation between various monitoring data. I would like to correlate the events from filebeat, metricbeat, heartbeat, packetbeat and find out the root cause of an incident before raising a ticket in Jira.

Ex: There is a Spring boot REST service runs on Server A and it consumes a MongoDB service running on Server B. We are ingesting the Spring boot application logs and MongoDB logs in filebeat index. Also, there is a metricbeat that monitors Server A and Server B system metrics. There is a heartbeat that monitors the uptime for Spring boot service and the MongoDB service. At time t0 the MongoDB service went offline and the Spring Boot service logs started writing exception logs. The Watcher on the filebeat raised an incident. Also the watcher on heartbeat raised another incident. But the OpsTeam do not want two incidents as the root cause was the MongoDB service unavailability. I want to correlate these events and raise a single incident. How do I achieve this in Elasticsearch?

Note: I am aware that the same can be achieved using APM. But here I am looking for a solution that monitors some legacy application where attaching a apm agent or instrumenting the services using a tracing framework is not possible.

Topic		Replies	Views
AIOPs, correlating two indexes and ML Elasticsearch elastic-stack-machine-learning	1	601	July 20, 2020
Suggestions on Correlation for Centralized Log Monitoring Kibana elastic-stack-monitoring , elastic-stack-alerting	3	410	September 10, 2019
Watch for no metricbeats data on client Elasticsearch elastic-stack-monitoring , elastic-stack-alerting	2	428	April 9, 2019
Data Correlation in Elasticsearch Elasticsearch	2	1437	October 14, 2020
Metricbeat get data of eck Elasticsearch elastic-stack-monitoring	1	186	July 20, 2023

Root Cause Identification using filbeat, heartbeat and metribeat data

Related topics