How to measure Availability KPIs and downtime from heartbeats?

I've configured heartbeat file to send a heartbeat every 30 seconds, now if i received a beat with monitor_status "down", and the next beat was up.

  • Is there are a field represents the downtime between the beats?
  • what is the information can i get from this field "beat_monitor_duration_us"?

This is a great idea.

  1. I think it'd be great to have each document include total downtime or uptime. It's kinda tricky in a distributed situation however. It would probably depend on querying back to Elasticsearch at least on startup to handle a heartbeat restart or move to another machine. I've opened an issue here to track it. I think we should build it at some point.
  2. The monitor duration tracks how long it took to run the monitor check. That means sending out a request, then receiving downloading and processing the result. Since processing is negligible this maps to the overall performance of your site in most cases.