I think we're talking of the same thing. Do you know how up-to-date cluster knowledge is? In other words, if another cluster node lists the node to be checked in, for example, the "_nodes" result how long ago did it usually try to communicate with that node?
Right now, I've implemented a kind of "ping" where I issue a "_nodes//network" request and am happy if it answers at all, not caring for the exact result - the point was to use a request that is inexpensive and does not need any preparation on our side but generates the usual java.net exceptions like SocketTimeoutException if the problem remains. What we're interested in is just whether the node is reachable again, so to really check cluster health could be too strict a condition.
However, of course what we don't take into account with this approach, I could imagine, is network partitioning, or split-brain - the node could be reachable but still not be part of the cluster. Our check should just be a necessary condition to decide whether to give the node another try with a "real" request - it need not be sufficient in the mathematical sense of the word. Still, would you rather recommend your approach to us, it being "more sufficient" than ours?