Manage Fault Tolerance on Heartbeat

ea1987 · October 28, 2019, 1:48pm

Hi guys,
how is that possible to manage FT with this beat?
Here is a brief example:

Host A: hearbeat_instance_1 -> ping Host C
Host B: hearbeat_instance_2 -> ping Host C

How can I manage both instances to not perform the same ping on Host C, but work on active/passive mode? Is that possible? Are there, instead, other different implementations? For sure, I will not keep only 1 HB instance running in order to avoid disservices on uptime monitoring in case of HB fault.
Thanks

Andrew_Cholakian1 · October 28, 2019, 3:16pm

It's currently advised to simply run both heartbeats in an active/active fashion, setting different geo.location.name values for each.

We don't currently support active/passive but we may in the future.

Is active/active not possible for your use case? If not, we'd love to hear more about your use case.

ea1987 · October 28, 2019, 4:25pm

Hi Andrew,
yes, active/active can be possible but we should consider the following use case:

Host A: hearbeat_instance_1 -> http request to Web Service C
Host B: hearbeat_instance_2 -> http request to Web Service C

It's clear the http requests to WS C will double, causing a bit of requests overhead, due to double HB instances doing the same job.

Any advices about it?
Thanks

Andrew_Cholakian1 · October 28, 2019, 6:40pm

The benefit of an active/active configuration is that it's very robust. There's very little to go wrong in terms of failover (indeed, active/passive requires each heartbeat to monitor the other in some form). Additionally, you get more assurances that your site is up due to measuring it from two separate locations.

Generally speaking the impact of heartbeat checks are minimal, accounting for one check every second at the very high end, with most people running checks every 10s or 1m.

ea1987 · November 4, 2019, 11:43am

That's fine. Anyway, I'm pretty confused about how to properly manage both instances checks cause I would like to avoid them to perform the same check in the same time interval. I can easily manage this situation by starting both instances in different time, but it's a manual management and that's something I would like to avoid. Maybe a centralized management would help.
I hope you understand what I meant

Thanks

Andrew_Cholakian1 · November 6, 2019, 9:06pm

@ea1987 can you help us understand why it's so important for your app not to have overlapping checks? Is it important for performance? If so, can you quantify the impact in terms of resources?

ea1987 · November 14, 2019, 2:05pm

Sorry for late reply. In my scenario overlapping checks are not that bad but I'm wondering what should be the right implementation when I will need to perform checks on a high number of istances/servers etc using more than 2 hearbeat instances.
Maybe my doubts are more general and are related to a kind of beats management platform (I'm thinking about a central beats management server similar to the the APM Agent Configuration). Is that a legitimate consideration?
Thanks

Andrew_Cholakian1 · November 26, 2019, 8:44pm

I think we can definitely improve our docs here. I've opened https://github.com/elastic/uptime/issues/121 to track this need, I'll try and write some docs here over the next few weeks.

system · December 24, 2019, 8:44pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Performance of hearbeat Beats heartbeat	2	357	August 13, 2019
Heartbeat HA Configuration Elastic Observability	1	20	October 18, 2024
Calculate total uptime/downtime Beats heartbeat	3	1394	February 24, 2020
Heartbeat scalability Beats heartbeat	4	659	February 1, 2019
Http hearbeat Status cloned thru the list of configured URL Beats heartbeat	2	891	November 20, 2017

Manage Fault Tolerance on Heartbeat

Related topics