Hi,
I'm running Beats version 5.4.2 and running into a constant issue which persists through restart of both heartbeat and the host system.
Heartbeat will work for a period (there doesn't seem to be a pattern as to how long), before it stops starting new scheduled jobs.
I notice in the logs there are messages for each individual monitored IP address:
2019-05-16T01:01:00+10:00 INFO Scheduled job 'icmp-ip@x.x.x.x' already active.
There are over 2000 IP addresses that heartbeat is monitoring and this has been working for over a year now with no changes to the system.
Here is my heartbeat.yml config file:
heartbeat.monitors:
- type: icmp
name: group1
schedule: '0 1-59/3 * * * * *'
watch.poll_file:
path: /data/beats/heartbeat/monitors/group1.json
interval: 5s
- type: icmp
name: group2
schedule: '0 2-59/3 * * * * *'
watch.poll_file:
path: /data/beats/heartbeat/monitors/group2.json
interval: 5s
- type: icmp
name: group3
schedule: '0 3-59/3 * * * * *'
watch.poll_file:
path: /data/beats/heartbeat/monitors/group3.json
interval: 5s
#================================ Outputs =====================================
#------------------------------- File output -----------------------------------
output.file:
# Boolean flag to enable or disable the output module.
enabled: true
# Path to the directory where to save the generated files. The option is
# mandatory.
path: "/path/to/event/file"
# Name of the generated files. The default is `heartbeat` and it generates
# files: `heartbeat`, `heartbeat.1`, `heartbeat.2`, etc.
filename: heartbeat
# Maximum size in kilobytes of each file. When this size is reached, and on
# every heartbeat restart, the files are rotated. The default value is 10240
# kB.
rotate_every_kb: 10000
# Maximum number of files under path. When this number of files is reached,
# the oldest file is deleted and the rest are shifted from last to first. The
# default is 7 files.
number_of_files: 7
#================================ Logging =====================================
#logging.level: debug
logging.to_files: true
logging.files:
path: /path/to/log/destination
name: heartbeat.log
keepfiles: 7
It's a fairly simple setup, however there are a large volume of IP addresses being monitored. I had to actually increase the read/write socket buffer memory limits on the host system in order for all the pings to go through:
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_default = 8388608
net.core.wmem_max = 16777216
The host system is CentOS 7
Please let me know if any more details are needed.