Seeking your advice and help.
Currently my Heartbeat will monitor system health for every 5 minutes. Then, it will save the data to Elasticsearch. My Elastalert2 will retrieve the date from Elasticsearch.
I have 2 system being monitored
- Application 1
- Application 2
Below is my expectations
- monitor every 5 minutes and if it hit 6 down for group or individual of 6 hits for past 30 minutes to trigger and alert
Below is my Elastalert2 code
config.yml
run_every:
minutes: 1
buffer_time:
minutes: 5
rule.yml
type: "frequency"
index: "filebeat-*"
realert:
minutes: 5
num_events: 6
timeframe:
minutes: 30
filter:
- term:
monitor.status.keyword : "down"
- terms:
monitor.id.keyword : ["Application-1", "Application-2"]
Let's say below is the timing the Elastalert2 executed
5:00
5:05
5:10
5:15
5:20
5:25
5:30
5:35
and let's say below is the timing the Heartbeat executed
5:00
5:05
5:10
5:15
5:20
5:25
5:30
5:35
Not what it happens is that
- It is counting number of hits separately as Application-1 as 3 hits and Application-2 as 3 hits causing it to trigger alert at 5:10 rather than grouping together as Application-1 and Application-2 as 6 hits and trigger alert at 5:25
Also once it trigger alert, it wait for anotehr 6 even to trigger. I want it to count from current time to last 30 minutes do the checking and trigger allert. Example 5:25, 5:30, 5:35 as the system is still down for the past 30 minutes form current time