I'm running Beats version 5.4.2 and running into a constant issue which persists through restart of both heartbeat and the host system.
Heartbeat will work for a period (there doesn't seem to be a pattern as to how long), before it stops starting new scheduled jobs.
I notice in the logs there are messages for each individual monitored IP address:
2019-05-16T01:01:00+10:00 INFO Scheduled job 'firstname.lastname@example.org' already active.
There are over 2000 IP addresses that heartbeat is monitoring and this has been working for over a year now with no changes to the system.
Here is my heartbeat.yml config file:
heartbeat.monitors: - type: icmp name: group1 schedule: '0 1-59/3 * * * * *' watch.poll_file: path: /data/beats/heartbeat/monitors/group1.json interval: 5s - type: icmp name: group2 schedule: '0 2-59/3 * * * * *' watch.poll_file: path: /data/beats/heartbeat/monitors/group2.json interval: 5s - type: icmp name: group3 schedule: '0 3-59/3 * * * * *' watch.poll_file: path: /data/beats/heartbeat/monitors/group3.json interval: 5s #================================ Outputs ===================================== #------------------------------- File output ----------------------------------- output.file: # Boolean flag to enable or disable the output module. enabled: true # Path to the directory where to save the generated files. The option is # mandatory. path: "/path/to/event/file" # Name of the generated files. The default is `heartbeat` and it generates # files: `heartbeat`, `heartbeat.1`, `heartbeat.2`, etc. filename: heartbeat # Maximum size in kilobytes of each file. When this size is reached, and on # every heartbeat restart, the files are rotated. The default value is 10240 # kB. rotate_every_kb: 10000 # Maximum number of files under path. When this number of files is reached, # the oldest file is deleted and the rest are shifted from last to first. The # default is 7 files. number_of_files: 7 #================================ Logging ===================================== #logging.level: debug logging.to_files: true logging.files: path: /path/to/log/destination name: heartbeat.log keepfiles: 7
It's a fairly simple setup, however there are a large volume of IP addresses being monitored. I had to actually increase the read/write socket buffer memory limits on the host system in order for all the pings to go through:
net.core.rmem_default = 8388608 net.core.rmem_max = 16777216 net.core.wmem_default = 8388608 net.core.wmem_max = 16777216
The host system is CentOS 7
Please let me know if any more details are needed.