How do I consume Heartbeat data for meaningful alerting?

geoffmore · February 18, 2020, 10:02pm

I have created a Watcher that consumes the data from a named, icmp, Heartbeat monitor [1] running as a Kubernetes DaemonSet [2]. I have not yet gotten approval to post the configuration of the Watcher, so I will provide a high-level description of how I get the data.

The watcher includes the following fields which are documented here [3] and here [4]:

agent.hostname
rtt.us
url.domain
monitor.type
monitor.name

To get to my desired data:

Documents are filtered using a the "icmp" monitor type and a provided monitor name
Documents are separated into unique buckets using agent.hostname (source) and url.domain (dest) [5]
Once the documents are separated by into unique buckets with a max bucket count of 10k (this may be the maximum)
A max aggregation [6] is performed on each of those composite buckets

Heartbeat is configured to create a ping every 5 seconds; both the Input [6] and Trigger [7] are configured to look back 5m, so I expect 60 documents per pair over that duration.
On my largest Kubernetes cluster, there are 23 nodes, so I expect 60 * 23^2 (~63.5k) documents across 23^2 * 2 (~1k) buckets (23^2 for composite, x2 because each composite gets a max agg). Even though my max buckets is configured at 10k buckets, I have noticed that I get an after_key [8] which truncates all pairs after around 16 (when iterating a-a -> w-w).

I would like to know what I should be doing to return the entire output for consumption by my Watcher. I would also like to know how to address the cases where I scale from 2 to n pages of output based on the number of buckets I'm returning. I believe the Chain Input [9] solves the problem with a known number of pages, but I am not sure if this is the best and/or most scalable approach.

As an aside, if/when I am able to share this Watcher code, where would be the best public repo to keep it?

(edit: fixed formatting issue)

system · March 17, 2020, 10:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Parsing complex bucket aggregations in watcher Elasticsearch	3	792	May 12, 2017
X-Pack Watcher -- adding additional fields Elasticsearch elastic-stack-alerting	2	489	July 18, 2018
Watcher alert on buckets data Elasticsearch elastic-stack-alerting	5	4296	November 22, 2017
How to write a watcher with match, term aggregation, and count? Elasticsearch elastic-stack-alerting	4	2196	December 31, 2018
Accessing Watcher Payload for Buckets Elasticsearch elastic-stack-alerting	3	2214	May 23, 2017

How do I consume Heartbeat data for meaningful alerting?

Related topics