A node never recovers from the bad state

21:40:00 INFO cluster.routing.allocation i31@z0 [2017-06-19 21:40:00,618][INFO ][cluster.routing.allocation] [instance-0000000031] Cluster health status changed from [GREEN] to [YELLOW] (reason: [[{instance-0000000030}{LgLGRrQzRzeVp4uulJcmOg}{}{}{logical_availability_zone=zone-0, availability_zone=us-east-1e, max_local_storage_nodes=1, region=us-east-1, master=true}] failed]).

The node instance-0000000030 in cluster f58ab87 was down on 6/19 afternoon but until 6/20 13:00 p.m., it had not recovered since then. I am wondering if that caused one of the nodes having high CPU load (>90%) for over 6 hours and did not seem like going down at all.

