A certain periodic task is supposed to happen once a minute. I send an alert when this is hanging using the following condition in a Threshold Alert.
WHEN count() GROUPED OVER top 1 'myPeriodicTaskLog'
IS BELOW 1 FOR THE LAST 2 minutes
My complication: This task is occurring separately in multiple Docker Instances, and I want to check that none of them is blocked.
I want to say "The key myPeriodicTaskLog must occur each minute in each instance. Otherwise send an alert."
I have the field instance_name. Each instance's name is assigned pseudorandomly on each deployment (i.e., something like "a58hgh12g2"). So, I cannot code the condition to include these names as literals but can use these values to aggregate.
Thank you. That query does not mention myPeriodicTaskLog . So it seems that it is tracking node liveness, but not liveness of that thread on each node. Is that right?
I'd like to do
node_list = query for a list of nodes that have been live at all in the last minute, based on instance_name field which occurs in each log line
query for the presence of myPeriodicTaskLog in the last minute, grouped by node for each node in node_list. If that query does not return at least 1 value for each node, send an alert.
The following watch will look at packetbeat data by grouping all data by ip address, then using that grouping, determine if any of the documents in each ip bucket are missing a response code. If so, it will fire an alert. This feels similar to what you're doing so hopefully this will help. One thing I recommend is writing an ES query that will actually detect the data you are hoping to use in the condition. If you can do that, you can definitely create a watch for it.
Thank you. It looks like ctx.payload.aggregations.unique_beat_names.buckets gives the list of unique IPs (in our case, that will be Nodes/instances by instance_name rather than IP.)
Then , this script gives the boolean for an alert . Importantly. it looks like this is in the Painless scripting language which is rich enough to encode any needed logic
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.