Hello!
I have a software that is critical for us so we want to monitor its availability. We want to start by monitoring the process/service that it creates when running. The software runs on Linux systems. Currently, I have install the elastic agent and configured the policy to retrieve System process metrics.
And I am in fact receiving document when the process(BESClient specifically) is running. However, I need to create a monitor rule that triggers when the process is not running. I tried defining a rule to alert me when there was 1 or less documents in the last 5min but that doesn’t trigger anything.
Below few pointers could help to understand & troubleshoot further :
Could you please share what is the time duration used to monitor this service? Last 5/10/15/30 minutes?
Can you try executing the query in the Discover tab to see what is the output for the query and the records that are returned?
Ideally if the service is down & there is no record for this process your alert should trigger when you check say for last 5/10 min but if time duration is 24 hours than the rule might not trigger as older records will satisfy the condition.
I am using Last 5 minutes.
There are no results for the agent when the process is not running, results come back once I re-enable the process.
I enabled the option “Alert me if there's no data” and that will trigger the alert in Elastic.
Is that a good approach?
Update: My approach worked when there is one single agent in the Fleet policy, once I added a second one the alert rule doesn’t trigger. Even though I group by host.name the alerts.
I think it is not working because while the process is not running on one of the hosts it is running on the other and the query is indeed returning results.
It seems the alert does not trigger as there are no records to group by , if the record exists & condition meets than it uses group by so say the usecase will be hostname is sending up/down messages & if down, group by hostname in this scenario it will create trigger for different hostnames. In your case since the record is not received say for 4/5 hostname it will not be able to throw alert for 4 hostnames.
One way is using Watcher similar usecase :
You will have to add all the hostnames in an array for which you expect a record [ ] & if count is 0 for any of the host it will add that in the list for missing hostnames.
Example in case of kibana data : kibana_sample_data_ecommerce
Output when it checks for last 15 minutes record received has count < 1 group by Gender :
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.