Mean Time Between Failure Heartbeat documents

mando_mat · September 10, 2019, 5:16pm

Hi,
I'm trying to calculate the mean time between recovery and the mean time between failure of some services monitored by heartbeat. For example, for MTBR, for each service I would like to get the time elapsed between two successive documents with the same monitor.id and having monitor.status down and up respectively. How can I do that?

p.s. I can also do further offline operations once I have obtained the data.

Andrew_Cholakian1 · September 10, 2019, 6:44pm

This is actually kind of tricky. I have a branch where I've been working on accurately doing this sort of work here: https://github.com/andrewvc/kibana/tree/timelines . You can track this issue: https://github.com/elastic/uptime/issues/55 . It's on our roadmap. Once we have that underlying infrastructure we can calculate things like MTBR accurately.

mando_mat · September 10, 2019, 7:34pm

Thank you so much for the information. In the meantime, could you point me to a workaround maybe working a little with aggregations?

Andrew_Cholakian1 · September 10, 2019, 8:08pm

There's not really a great one that you can do in a single query. A prereq for timelines is including the frequency of the check with each message, which will let you calculate a somewhat accurate number for average time down over a period (just the sum of the frequency for all down checks). The timelines PR is more accurate (handling mis-scheduled items) but requires a lot of complex processing in JS.

system · October 8, 2019, 8:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to measure Availability KPIs and downtime from heartbeats? Beats heartbeat	2	535	March 1, 2019
Calculate total uptime/downtime Beats heartbeat	3	1456	February 24, 2020
Hi, I am trying to create time count in mins since the link is up in heartbeat from the time it was down using TSVP Graph Kibana	2	381	February 9, 2021
Timelion heartbeat downtime count analysis Kibana	2	401	May 31, 2020
Calculating MTTR(Mean Time to Recovery) Kibana	2	921	December 17, 2020

Mean Time Between Failure Heartbeat documents

Related topics