Alerting if a beat is down

Hello,

I have many beats running in different hosts. I need to monitor these agents so I' ve enabled the xpack.monitoring and now I can see event rate, fail rate, cpu util etc. in Kibana. Can I Somehow create alerts for when a beat's event rate is zero for the last 10 hours for example? Or is there any other way to alert on the uptime of beats?

1 Like

Maybe Kibana alerting? See: https://www.elastic.co/what-is/kibana-alerting

I think @sfenman is asking a valid question.
Maybe I can rephrase. What data is used that computes the
'Fail Rate/s' for a beats instance.

I have bean trying to digest this from the .monitoring-beats-7 index data. But for the love of god, I cannot figure out which fields are used to compute the Fail Rate

This is definitely a valid question and also not easy to answer. One of the issue with elasticsearch and nosql in general is imho that it's hard to detect something that is missing because in order to do this you'd need to know what is supposed to be running. You would need a query which first retrieves all hosts where beat x/y should be running and then check if there is incoming data for all those beats. (same issue for other similar queries, for example try finding a host where a certain service is not existant)

Afaik relying on the beat monitoring functionality is not a good idea, because when a beat would be down for whatever reason, there is a chance there wouldn't be any monitoring data at all.. Theoretically you could create an alert for each hosts - beat separately and check if there is data, but this is not sustainable for 10000's of hosts?

I didn't have the chance to test the new Agent and Fleet yet. If designed correctly this could be a very interesting feature to integrate from the start. The moment an agent is enrolled, it should not dissapear from Fleet when there is no incoming data (like it is currently in Beats monitoring) and instead trigger some configured Kibana connector. Unenrolling should be a manual (or automated) action.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.