Configure scheduled watcher against monitoring indices which only looks at the latest X minutes of data

Hi,
I have a separate monitoring cluster which is housing all the xpack.monitoring data being sent from the production elasticsearch cluster. I want to put this monitoring data to good use by creating some alerts so I thought of using watchers and was doing some reading. First of all, is it a normal practice to configure watchers on monitoring indices via queries? After all, the whole reason to create a separate cluster for monitoring is so that we don’t repeatedly hit the production cluster APIs. Secondly, looks like the watchers are stateless.
Let’s say I want to create an alert which will check all the monitoring indices on the monitoring cluster every 10 minutes and alert if the cpu usage goes beyond 80%. What if this alert hit and it came back down under 80%, the next run will again find the 80% in one of the index since I’m planning to retain data for the default 3 days. From what I saw, everyday is a different index. Should your watcher be smart and check the index for a time range in the index - update_timestamp - 5 minutes perhaps? Does the data in the monitoring index get overwritten or it maintains previous values?
Or you need to record the previous state yourself? I want to avoid this approach to compare against a previous value unless it’s the only approach. Any suggestions?

I examined the indices created on the monitoring cluster. Looks like the same day's data is overwritten into different types in the day's index so getting the 1st index ordered by time desc should give the latest state at that point of time. Or, you can even filter the index you are searching for by date.

when monitoring time based data, your watches should indeed contain a filter in your searches to only drill down the last n minutes.

Check the examples repo for a few more examples.

Thanks @spinscale

One last question, if you look at the example in https://www.elastic.co/guide/en/watcher/current/watching-marvel-data.html#watching-cluster-health
"query": {
"bool": {
"filter": {
"range": {
"timestamp": {
"gte": "now-2m",
"lte": "now"
}
}
}
}
},

Does time filtering work differently? the example says it's looking for the last 60 seconds but it's looking at the last 2 minutes. Am I missing something here?

It also looks like the JVM usage example link is expected to be working only on one bucket on the minute interval aggregation, for example a now - 1 min will fall under 2 timestamp buckets and each will have the same node, does it take the average of heap_used_percent across the buckets?

the example you are linking to uses aggregations in its condition to decide if it should be triggered, but I think you can go with your example for now. That example is somewhat outdated and should be replaced though.

Yes, I'm not sure how the indices were structured prior to 6.4. Looks like in the latest versions, we need to just fetch a range timestamp and then aggregate your nodes/indices for info rather than the earlier versions which look to fetch a range and aggregate on time intervals and then, nodes/indices.

Would you be able to tell me the difference just for my curiosity?

the examples queries the marvel indices which was the product predating the monitoring one and the data structure being used was just different, this requiring a different query.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.