Manage Fault Tolerance on Heartbeat

Hi guys,
how is that possible to manage FT with this beat?
Here is a brief example:

Host A: hearbeat_instance_1 -> ping Host C
Host B: hearbeat_instance_2 -> ping Host C

How can I manage both instances to not perform the same ping on Host C, but work on active/passive mode? Is that possible? Are there, instead, other different implementations? For sure, I will not keep only 1 HB instance running in order to avoid disservices on uptime monitoring in case of HB fault.
Thanks

It's currently advised to simply run both heartbeats in an active/active fashion, setting different geo.location.name values for each.

We don't currently support active/passive but we may in the future.

Is active/active not possible for your use case? If not, we'd love to hear more about your use case.

Hi Andrew,
yes, active/active can be possible but we should consider the following use case:

Host A: hearbeat_instance_1 -> http request to Web Service C
Host B: hearbeat_instance_2 -> http request to Web Service C

It's clear the http requests to WS C will double, causing a bit of requests overhead, due to double HB instances doing the same job.

Any advices about it?
Thanks

The benefit of an active/active configuration is that it's very robust. There's very little to go wrong in terms of failover (indeed, active/passive requires each heartbeat to monitor the other in some form). Additionally, you get more assurances that your site is up due to measuring it from two separate locations.

Generally speaking the impact of heartbeat checks are minimal, accounting for one check every second at the very high end, with most people running checks every 10s or 1m.

That's fine. Anyway, I'm pretty confused about how to properly manage both instances checks cause I would like to avoid them to perform the same check in the same time interval. I can easily manage this situation by starting both instances in different time, but it's a manual management and that's something I would like to avoid. Maybe a centralized management would help.
I hope you understand what I meant :slight_smile:

Thanks

@ea1987 can you help us understand why it's so important for your app not to have overlapping checks? Is it important for performance? If so, can you quantify the impact in terms of resources?

Sorry for late reply. In my scenario overlapping checks are not that bad but I'm wondering what should be the right implementation when I will need to perform checks on a high number of istances/servers etc using more than 2 hearbeat instances.
Maybe my doubts are more general and are related to a kind of beats management platform (I'm thinking about a central beats management server similar to the the APM Agent Configuration). Is that a legitimate consideration? :slight_smile:
Thanks

I think we can definitely improve our docs here. I've opened https://github.com/elastic/uptime/issues/121 to track this need, I'll try and write some docs here over the next few weeks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.