is there another solution to monitor the uptime of several (> 100) machines when I cannot externally ping them? I assume heartbeat won't work then? I tried to count the logs of each machine. However, this way I don't notice missing machines that did not log.
I am using a hosted deployment of the elastic stack in the elastic cloud.
maybe you could activate monitoring if didn't already do so on your cluster and then use a Watcher to notify you once a node goes down?
See n example of this here. I'm sure you could adapt that to exactly fit your needs and maybe just amen sure that the cluster is at an exact machine count for example?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.