Heartbeat stops pinging http(s) routes

Hello,

I have installed heartbeat (heartbeat-7.17.6-windows-x86_64) in order to monitor the availability of some URLs.

The pb is that it stops pinging without apparent reason (I see nothing special in the logs) while the service continues working.
I happens every one or two days.
I have stored the parts of the log files one or two hours around the last data received before restart, but I seems not possible to post them here.

Any help is welcome. Thanks.

Can you share the configuration for the affected monitors, with any sensitive info redacted?

Hello Andrew,

Here is the part heartbeat.monitors: of the file heartbeat.yml

heartbeat.monitors:
- type: http
  enabled: true
  id: heartbeat_http_check_ID
  name: HTTP check with heartbeat
  urls:
      - "http://xconf-qa.equant.com:8080/Ping"
      - "https://xconf-qa.equant.com/XCONF.API/api/Test"
      - "http://10.238.116.238/XConfWebAPI/info/warningCodes"
      - "https://xconf-qa.equant.com/XCONF"

  schedule: '@every 1m'
  ssl.verification_mode: none
  

Let me know what other information could be usefull.
Thank you.

Hi @ppic,

Thanks for the information provided, we'll try to replicate it on our side but considering the conditions under which it happens it might take a while.
In the mean time, could you help us understand a bit more about the context you're running these monitors on:

  • Are these the only monitors that Heartbeat is running? If there're others, do those continue running?
  • Does Heartbeat still generate logs after it stops running monitors? There's usually an event periodically emitted when there're no documents to index, do those still happen?
  • How is Heartbeat running inside Windows, is it installed as a service or running as a bare binary?

Hi Emilio,

Here are the answers to your questions:

  • Are these the only monitors that Heartbeat is running? If there're others, do those continue running?
    ==> Yes, the only section heartbeat.monitors: of the file heartbeat.yml is displayed above.

  • Does Heartbeat still generate logs after it stops running monitors? There's usually an event periodically emitted when there're no documents to index, do those still happen?
    ==> yes, Heartbeat continues generatings logs.
    With logging.level: debug, we can see there are no more Publish event: sections in the log file after it stops pinging. Same for the lines :
    DEBUG [scheduler] scheduler/scheduler.go:188 Job 'heartbeat_http_check_ID' started.
    I don't see event periodically emitted when there're no documents to index. Which tag is it ?
    Is there a way to send you the relevant parts of the log files ?

  • How is Heartbeat running inside Windows, is it installed as a service or running as a bare binary?
    ==> As a service. I followed the instructions :
    https://www.elastic.co/guide/en/beats/heartbeat/7.17/heartbeat-installation-configuration.html

I have modified the monitor section by keeping only one url to ping (the first one with 'Ping'). We will see if it changes something....

Thank you.

Hello,

I had to be out of office for many days, so I have no many results to show.

When heartbeat.yml is configured with 4 URLs, it works generally for around 24h.
With 3 URLs, it worked for around 35h
With 1 URL, it worked for more than 2, 3 or 4 days, then I stopped the service to change the test.

The number of URLs to ping seems to be important in this anomaly.

In "Discover", I saw another thing that might be interesting, here is the relevant sample:

Feb 25, 2023 @ 14:02:17.000	200	GET	/XCONF.API/api/Test
Feb 25, 2023 @ 14:02:21.000	502	GET	/Ping
Feb 25, 2023 @ 14:02:25.000	302	GET	/XCONF
Feb 25, 2023 @ 14:03:17.000	200	GET	/XCONF.API/api/Test
Feb 25, 2023 @ 14:03:23.000	502	GET	/Ping
Feb 25, 2023 @ 14:03:25.000	302	GET	/XCONF
Feb 25, 2023 @ 14:04:17.000	200	GET	/XCONF.API/api/Test
Feb 25, 2023 @ 14:04:22.000	502	GET	/Ping
Feb 25, 2023 @ 14:04:25.000	302	GET	/XCONF
Feb 25, 2023 @ 14:05:17.000	200	GET	/XCONF.API/api/Test
Feb 25, 2023 @ 14:05:22.000	502	GET	/Ping
Feb 25, 2023 @ 14:05:25.000	302	GET	/XCONF
Feb 25, 2023 @ 14:06:16.000	200	GET	/XCONF.API/api/Test
Feb 25, 2023 @ 14:06:20.000	502	GET	/Ping
Feb 25, 2023 @ 14:06:25.000	302	GET	/XCONF
Feb 25, 2023 @ 14:07:17.000	200	GET	/XCONF.API/api/Test
Feb 25, 2023 @ 14:07:25.000	302	GET	/XCONF
Feb 25, 2023 @ 14:08:15.000	502	GET	/Ping
Feb 25, 2023 @ 14:08:17.000	200	GET	/XCONF.API/api/Test
Feb 25, 2023 @ 14:08:25.000	302	GET	/XCONF
Feb 25, 2023 @ 14:10:17.000	502	GET	/Ping
Feb 25, 2023 @ 14:11:15.000	502	GET	/Ping
Feb 25, 2023 @ 14:14:19.000	502	GET	/Ping
Feb 25, 2023 @ 14:15:17.000	502	GET	/Ping
Feb 25, 2023 @ 14:18:15.000	502	GET	/Ping
Feb 25, 2023 @ 14:19:13.000	502	GET	/Ping
Feb 25, 2023 @ 14:20:18.000	502	GET	/Ping
Feb 25, 2023 @ 14:22:42.000	502	GET	/Ping
Feb 25, 2023 @ 14:24:46.000	502	GET	/Ping
Feb 25, 2023 @ 14:25:47.000	502	GET	/Ping
Feb 25, 2023 @ 14:26:50.000	502	GET	/Ping
Feb 25, 2023 @ 14:27:47.000	502	GET	/Ping
Feb 25, 2023 @ 14:28:48.000	502	GET	/Ping
Feb 25, 2023 @ 14:29:50.000	502	GET	/Ping
Feb 25, 2023 @ 14:30:51.000	502	GET	/Ping
Feb 25, 2023 @ 14:34:49.000	502	GET	/Ping
Feb 25, 2023 @ 14:38:52.000	502	GET	/Ping
Feb 25, 2023 @ 14:39:49.000	502	GET	/Ping                 <== stop
Feb 25, 2023 @ 18:55:04.000	302	GET	/XCONF                <== restart
Feb 25, 2023 @ 18:55:18.000	200	GET	/XCONF.API/api/Test
Feb 25, 2023 @ 18:56:03.000	302	GET	/XCONF
Feb 25, 2023 @ 18:56:03.000	200	GET	/XCONF.API/api/Test
Feb 25, 2023 @ 18:57:03.000	302	GET	/XCONF
Feb 25, 2023 @ 18:57:03.000	200	GET	/XCONF.API/api/Test
Feb 25, 2023 @ 18:58:03.000	302	GET	/XCONF
Feb 25, 2023 @ 18:58:03.000	200	GET	/XCONF.API/api/Test
Feb 25, 2023 @ 18:58:46.000	502	GET	/Ping
Feb 25, 2023 @ 18:59:03.000	302	GET	/XCONF
Feb 25, 2023 @ 18:59:05.000	200	GET	/XCONF.API/api/Test
Feb 25, 2023 @ 19:00:03.000	302	GET	/XCONF
Feb 25, 2023 @ 19:00:03.000	200	GET	/XCONF.API/api/Test
Feb 25, 2023 @ 19:00:11.000	502	GET	/Ping
Feb 25, 2023 @ 19:01:04.000	302	GET	/XCONF
Feb 25, 2023 @ 19:01:04.000	200	GET	/XCONF.API/api/Test
Feb 25, 2023 @ 19:01:30.000	502	GET	/Ping
Feb 25, 2023 @ 19:01:30.000	400	GET	/Ping
Feb 25, 2023 @ 19:02:03.000	302	GET	/XCONF
Feb 25, 2023 @ 19:02:03.000	200	GET	/XCONF.API/api/Test
Feb 25, 2023 @ 19:02:41.000	502	GET	/Ping

During this time, no change in heartbeat.yml, no service restart.
You can see that it stops for more than 3 hours.

I hope this can help.
Thank you.

Hello,

Following a colleague's suggestion (using heartbeat under linux), I dispatched the 4 URLs into 4 heartbeat.monitors. But the result is thje same : stop pinging after 18h.
Since heartbeat under Windows doesn't seem to be reliable, I stoped the service and installed the "http" module of metricbeat. That works and seems more stable.

Hi @ppic ,

I'm glad to hear you've found a suitable workaround. We haven't been able to reproduce the issue on any of our tests, it could be related to how heartbeat interacts with the endpoints it's reaching.

I'm sorry I don't have more information to provide, some issues are just hard to figure out.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.