I'm running Beats version 5.4.2 and running into a constant issue which persists through restart of both heartbeat and the host system.

Heartbeat will work for a period (there doesn't seem to be a pattern as to how long), before it stops starting new scheduled jobs.

I notice in the logs there are messages for each individual monitored IP address:

2019-05-16T01:01:00+10:00 INFO Scheduled job 'icmp-ip@x.x.x.x' already active.

There are over 2000 IP addresses that heartbeat is monitoring and this has been working for over a year now with no changes to the system.

Here is my heartbeat.yml config file:


- type: icmp
  name: group1
  schedule: '0 1-59/3 * * * * *'
    path: /data/beats/heartbeat/monitors/group1.json
    interval: 5s

- type: icmp
  name: group2
  schedule: '0 2-59/3 * * * * *'
    path: /data/beats/heartbeat/monitors/group2.json
    interval: 5s

- type: icmp
  name: group3
  schedule: '0 3-59/3 * * * * *'
    path: /data/beats/heartbeat/monitors/group3.json
    interval: 5s

#================================ Outputs =====================================

#------------------------------- File output -----------------------------------
  # Boolean flag to enable or disable the output module.
  enabled: true

  # Path to the directory where to save the generated files. The option is
  # mandatory.
  path: "/path/to/event/file"

  # Name of the generated files. The default is `heartbeat` and it generates
  # files: `heartbeat`, `heartbeat.1`, `heartbeat.2`, etc.
  filename: heartbeat

  # Maximum size in kilobytes of each file. When this size is reached, and on
  # every heartbeat restart, the files are rotated. The default value is 10240
  # kB.
  rotate_every_kb: 10000

  # Maximum number of files under path. When this number of files is reached,
  # the oldest file is deleted and the rest are shifted from last to first. The
  # default is 7 files.
  number_of_files: 7

#================================ Logging =====================================

#logging.level: debug
logging.to_files: true
  path: /path/to/log/destination
  name: heartbeat.log
  keepfiles: 7

It's a fairly simple setup, however there are a large volume of IP addresses being monitored. I had to actually increase the read/write socket buffer memory limits on the host system in order for all the pings to go through:

net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_default = 8388608
net.core.wmem_max = 16777216

The host system is CentOS 7

Please let me know if any more details are needed.