What do people here think of creating a repository to store default (or example) Watcher alerts for the kubernetes and/or system module? (I'm also in favor of other modules but we should pick somewhere to start).
My hunch is that most teams would want a similar set of basic alerts, and that most teams don't have the best coverage so far. Combining efforts would be very helpful.
I've seen this repo. I guess to be more specific, I'm suggesting we create a full suite of alerts that teams can apply to get a base level of monitoring for each host running metricbeat.
Alarms for CPU, memory, filesystem (system module). Then things like pending pods, pods in a crash loop, pods using more resources than requested (kubernetes module).
Ideally one would just run make apply and we'd http PUT each watch.json to the cluster.
kube-prometheus does something very similar for the prometheus/grafana stack.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.