What do people here think of creating a repository to store default (or example) Watcher alerts for the kubernetes and/or system module? (I'm also in favor of other modules but we should pick somewhere to start).
My hunch is that most teams would want a similar set of basic alerts, and that most teams don't have the best coverage so far. Combining efforts would be very helpful.
There are some sample watches in this repository:
If none of the samples is what you're looking for, perhaps we can include some additional sample watches.
Hi @Michael_Madden, thanks for the reply.
I've seen this repo. I guess to be more specific, I'm suggesting we create a full suite of alerts that teams can apply to get a base level of monitoring for each host running metricbeat.
Alarms for CPU, memory, filesystem (system module). Then things like pending pods, pods in a crash loop, pods using more resources than requested (kubernetes module).
Ideally one would just run
make apply and we'd http PUT each watch.json to the cluster.
kube-prometheus does something very similar for the prometheus/grafana stack.
Just to follow up here... does that sound reasonable?
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.