Heartbeat config changes creates multiple runners/monitors

Have setup heartbeat-7.5.2 on a RedHat Enterprise Linux 7.7 (RHEL7) server.

Basic setup as of yet - pushing to two elasticsearch outputs.
http checks - nothing fancy.

heartbeat.config.monitors:
  path: ${path.config}/monitors.d/*.yml
  reload.enabled: true
  reload.period: 5s

Have setup 102 http monitors in separate configuration files.

- type: http
  schedule: '0 * * * * * *'
  id: "https://my.site/?our-monitoring-http"
  name: "https://my.site/?our-monitoringhttp"
  hosts:
    - "https://my.site/?our-monitoring-http"
  ipv4: true
  ipv6: false
  mode: any
  timeout: 5s
  check.request:
    headers:
        'User-Agent': 'Go-http-client/1.1 (CLI; RedHat; Linux x86_64) heartbeat-elastic/7.x (OUR Monitoring)'
  check.response:
    status: 200
    body: "node is ok"

Initially I did not add a monitor.id or monitor.name.
Then when I added those - without changing anything else - neither host or anything - I got massive errors in /var/log/messages:

Jan 28 18:20:35 alp-aot-ccm10 heartbeat: 2020-01-28T18:20:35.803+0100#011ERROR#011[reload]#011cfgfile/list.go:96#011Error creating runner from config: monitor ID https://my.site/?our-monitoring is configured for multiple monitors! IDs must be unique values.

And gradually the number of runners increased. I got one doc, 2 docs, 3 docs, etc pr iteration.
So I deleted all documents from elasticsearch with the auto id

curl -s -H 'Content-Type: application/json' -X POST 'http://localhost:9200/heartbeat-7.5.2-2020.01.28-000001/_delete_by_query?q=monitor.id:auto-*'

Then tried adjusting the query parameter from our-monitoring to our-monitoring-http and the errors was silenced.

However, based on the data I'm receiving both the our-monitoring and our-monitoring-http checks are still running.

It seems that if I adjust the yaml files, and restart heartbeat, the old checks/runners are still there.

  1. Is it possible to remove old config - restarts does not help
  2. What is the correct procedure to edit an existing configuration.

Sorry to hear your experience hasn't been great @sastorsl

I think there are a few things going on here:

  1. The Uptime UI shows all active monitors over a given time range. This means that if you change the monitor.id for a monitor, the data that was sent when it was active is still present. So, even if you deleted a monitor 4 weeks ago, enlarging the time window to cover, say, 5 weeks, will include that monitor in the list. We can probably improve the UX here, and generally only show monitors that were active at the end of the range by default.
  2. You can delete old data of course, by either deleting entire indices, or using the delete_by_query API to delete all monitors matching whatever criteria you like.

Let us know if that answers your question!

Au contraire so far the experience was better than average :slight_smile:
The benefit of a ready-made tool, which fits into our existing infrastructure: What's not to like?

I had two issues:

  1. The old config was not removed - and I could not figure out why.
  2. I suspected that I had done an error on configuration

However, the error was at my end - and I have solved it.

While creating the configuration I wrote up a short shell-script to create the /etc/heartbeat/monitors.d/*.yml files as I had set heartbeat.config.monitors.reload.enabled: true with that directory/path.

Somewhere during my testing I had created a "dotfile", probably an unresolved parameter in my shellscript.

So heartbeat read all my /etc/heartbeat/monitors.d/https*.yml files but also one named /etc/heartbeat/monitors.d/.yml ... and the latter contained all my 102 monitors but with the previous config I was trying out.

Removing the .yml (dot)file made heartbeat immediately (on the next config reload) remove the extra monitors - and now everything works as expected.

Lessons learned:

  • Setup debug logging while experimenting
  • Check for dotfiles :slight_smile:

I will create some salt states to manage the configuration when I have a satisfactory template.

Thank you @Andrew_Cholakian1 for your help.

Thanks for writing back with the full info here! I think we're hitting a limit of shell globs here in that they will include dot files by default.

I wonder if it would make sense to make heartbeat always ignore those...

Oh, also, if you run heartbeat in the foreground you can use heartbeat -e to output logs to stdout, that can be useful in testing!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.