Yeah, thats what happens at 5am.
I was just thinking there must be a way to get the servers to unblock instead of just waiting for timeouts for 20 minutes.
I do understand what happened though, I just don't think that there should ever be a case where the master servers are completely unresponsive and blocking all requests for status.
This might be aws specific, but here is what's going on.
In AWS when using security groups, as soon as a server is shutdown or goes away on its own, all requests to its IP address will now be dropped by the network stack without any replies to the source, because as far as the network stack is concerned that IP address is no longer part of the security group. It seems that elasticsearch doesn't handle that scenario very well. If I just shutdown the elasticsearch software then everything is fine because the host is up, port is closed and the master server gets a response right away. If the host is down it has to wait for multiple timeouts before it considers the host as down, but during that time everything is blocked.