Heartbeat scheduled jobs already active

packland · May 15, 2019, 3:17pm

Hi,

I'm running Beats version 5.4.2 and running into a constant issue which persists through restart of both heartbeat and the host system.

Heartbeat will work for a period (there doesn't seem to be a pattern as to how long), before it stops starting new scheduled jobs.

I notice in the logs there are messages for each individual monitored IP address:

2019-05-16T01:01:00+10:00 INFO Scheduled job 'icmp-ip@x.x.x.x' already active.

There are over 2000 IP addresses that heartbeat is monitoring and this has been working for over a year now with no changes to the system.

Here is my heartbeat.yml config file:

heartbeat.monitors:

- type: icmp
  name: group1
  schedule: '0 1-59/3 * * * * *'
  watch.poll_file:
    path: /data/beats/heartbeat/monitors/group1.json
    interval: 5s

- type: icmp
  name: group2
  schedule: '0 2-59/3 * * * * *'
  watch.poll_file:
    path: /data/beats/heartbeat/monitors/group2.json
    interval: 5s

- type: icmp
  name: group3
  schedule: '0 3-59/3 * * * * *'
  watch.poll_file:
    path: /data/beats/heartbeat/monitors/group3.json
    interval: 5s

#================================ Outputs =====================================

#------------------------------- File output -----------------------------------
output.file:
  # Boolean flag to enable or disable the output module.
  enabled: true

  # Path to the directory where to save the generated files. The option is
  # mandatory.
  path: "/path/to/event/file"

  # Name of the generated files. The default is `heartbeat` and it generates
  # files: `heartbeat`, `heartbeat.1`, `heartbeat.2`, etc.
  filename: heartbeat

  # Maximum size in kilobytes of each file. When this size is reached, and on
  # every heartbeat restart, the files are rotated. The default value is 10240
  # kB.
  rotate_every_kb: 10000

  # Maximum number of files under path. When this number of files is reached,
  # the oldest file is deleted and the rest are shifted from last to first. The
  # default is 7 files.
  number_of_files: 7


#================================ Logging =====================================

#logging.level: debug
logging.to_files: true
logging.files:
  path: /path/to/log/destination
  name: heartbeat.log
  keepfiles: 7

It's a fairly simple setup, however there are a large volume of IP addresses being monitored. I had to actually increase the read/write socket buffer memory limits on the host system in order for all the pings to go through:

net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_default = 8388608
net.core.wmem_max = 16777216

The host system is CentOS 7

Please let me know if any more details are needed.

Andrew_Cholakian1 · May 29, 2019, 9:56pm

This can be ameliorated by decreasing the timeout value. Try setting the timeout option to a lower value (maybe 2s). The default is quite long and can tie up resources.

The true fix is to offset the timing to be from the finish of the last check rather than on a strict schedule. However, that wouldn't apply in this case since you're using cron. It would instead apply only to the @every syntax.

I've opened up two issues to address these concerns.

system · June 26, 2019, 9:56pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Heartbeat.config.monitors option not working on 6.5.4? Beats heartbeat	7	1860	March 4, 2019
Heartbeat 6.7.0 isnt updating kibana uptime endpoint count Beats heartbeat	3	463	May 7, 2019
Heartbeat 7 seems to stop running Beats heartbeat	4	443	March 28, 2019
Heartbeat stops pinging http(s) routes Beats heartbeat	8	659	April 13, 2023
Heartbeat configuration for 1000+ IPs Beats heartbeat	5	469	October 25, 2023

Heartbeat scheduled jobs already active

Related topics