Initially I did not add a monitor.id or monitor.name.
Then when I added those - without changing anything else - neither host or anything - I got massive errors in /var/log/messages:
Jan 28 18:20:35 alp-aot-ccm10 heartbeat: 2020-01-28T18:20:35.803+0100#011ERROR#011[reload]#011cfgfile/list.go:96#011Error creating runner from config: monitor ID https://my.site/?our-monitoring is configured for multiple monitors! IDs must be unique values.
And gradually the number of runners increased. I got one doc, 2 docs, 3 docs, etc pr iteration.
So I deleted all documents from elasticsearch with the auto id
curl -s -H 'Content-Type: application/json' -X POST 'http://localhost:9200/heartbeat-7.5.2-2020.01.28-000001/_delete_by_query?q=monitor.id:auto-*'
Then tried adjusting the query parameter from our-monitoring to our-monitoring-http and the errors was silenced.
However, based on the data I'm receiving both the our-monitoring and our-monitoring-http checks are still running.
It seems that if I adjust the yaml files, and restart heartbeat, the old checks/runners are still there.
Is it possible to remove old config - restarts does not help
What is the correct procedure to edit an existing configuration.
Sorry to hear your experience hasn't been great @sastorsl
I think there are a few things going on here:
The Uptime UI shows all active monitors over a given time range. This means that if you change the monitor.id for a monitor, the data that was sent when it was active is still present. So, even if you deleted a monitor 4 weeks ago, enlarging the time window to cover, say, 5 weeks, will include that monitor in the list. We can probably improve the UX here, and generally only show monitors that were active at the end of the range by default.
You can delete old data of course, by either deleting entire indices, or using the delete_by_query API to delete all monitors matching whatever criteria you like.
Au contraire so far the experience was better than average
The benefit of a ready-made tool, which fits into our existing infrastructure: What's not to like?
I had two issues:
The old config was not removed - and I could not figure out why.
I suspected that I had done an error on configuration
However, the error was at my end - and I have solved it.
While creating the configuration I wrote up a short shell-script to create the /etc/heartbeat/monitors.d/*.yml files as I had set heartbeat.config.monitors.reload.enabled: true with that directory/path.
Somewhere during my testing I had created a "dotfile", probably an unresolved parameter in my shellscript.
So heartbeat read all my /etc/heartbeat/monitors.d/https*.yml files but also one named /etc/heartbeat/monitors.d/.yml ... and the latter contained all my 102 monitors but with the previous config I was trying out.
Removing the .yml (dot)file made heartbeat immediately (on the next config reload) remove the extra monitors - and now everything works as expected.
Lessons learned:
Setup debug logging while experimenting
Check for dotfiles
I will create some salt states to manage the configuration when I have a satisfactory template.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.