Hi All,
I need some advice on how to implement watcher alerting. For example, we have 10 application servers, that have 5 programs running as processes. I have already installed metricbeat on these servers, and can see documents within the metricbeat index saying that
process.name: chrome.exe
system.process.state: running
Metricbeat is set to check every 30 seconds, and thus sends 1 document every 30 seconds, also equating to 10 every 5 minutes
I want to implement alerts for when one of these process has crashed or is not running. This is where it gets a bit difficult. From my implementation, metricbeat only monitors if the process is running, you don't get a message saying that the process is suspended, crashed or whatever else (unless someone has a magic solution?)
So the only way I can think of implementing alerting, is to create a watcher that counts the number of documents, with a conditional alert that triggers if it less than 5 document are counted within a 5 minute period (allow for missed messaged?)
This seems awfully complicated for something thats quite a basic function of a monitoring tool. By my calculations I need to implement 50 watchers (10 servers 5 processes).
Is this the only way to do it? Or does someone have any other ideas?