Heartbeat not running all monitors when scheduler limit is set

bwright1558 · November 13, 2019, 11:17pm

I'm using Heartbeat version 7.4.2. I've got 40+ HTTP monitors configured for Heartbeat, all of which use the cron syntax for scheduling. I'm using '0 0 */2 * * * *', meaning every 2 hours. I also have the scheduler.limit set to 10, so only 10 tasks at a time. When the time rolls around to run the monitors, not all monitors are run. In fact, they don't run until 2 more hours later (sometimes not even then). Sometimes there will be monitors that don't run for 6+ hours.

Am I misunderstanding how the scheduler limit should work? Or is this a bug? Thoughts and suggestions please.

Andrew_Cholakian1 · November 14, 2019, 2:02pm

That cron config 0 0 */2 * * * * means every 2 days at midnight. I think you want 0 */2 * * * * * which means the first minute every 2 hours. I'd recommend using the syntax @every 2h instead, since it's much more straightforward to read.

EDIT: My mistake, I was reading it wrong, you are right, that is every 2 hours. I'm going to try and repro this.

That said, I'd be interested to hear what use case you have for running heartbeat so infrequently? I've been mulling whether we should put a cap on the maximum amount of time between checks (there's some potential to improve perf in ES queries if we can depend on that).

Generally people don't check more than say 5 minutes apart. The main reason is that if a check fails for a transient reason you want to see how long it took to recover. With a two hour check, it will take 2 full hours to check.

bwright1558 · November 14, 2019, 3:24pm

Please don't cap the maximum amount of time between checks. At least allow every couple hours. We do have good reason for running our checks infrequently. We have kind of a unique use case and heartbeat I hope is the solution that will solve all our maintenance woes. Adding a cap of less than an hour will make heartbeat useless to us. So please don't cap it. I'll message you our specific use case.

Andrew_Cholakian1 · November 18, 2019, 5:41pm

I've opened this PR https://github.com/elastic/beats/pull/14569 which should solve the underlying issue.

system · December 16, 2019, 5:41pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Possible regression in Heartbeat 7.5.0 Beats docker , heartbeat	4	527	January 9, 2020
Monitoring several url and setting schedule Beats heartbeat	4	534	March 6, 2023
Heartbeat.config.monitors option not working on 6.5.4? Beats heartbeat	7	1860	March 4, 2019
Heartbeat monitoring system resources utilization Beats heartbeat	5	387	December 8, 2020
Heartbeat timeout Beats heartbeat	6	1018	March 9, 2021

Heartbeat not running all monitors when scheduler limit is set

Related topics