As I mentioned in another thread, it would be useful to see what these nodes are so busy doing. I'd try the nodes hot threads API: GET /_nodes/hot_threads?threads=99999
. However, this runs on the management threadpool, so if that threadpool is where the problem is then it won't work. If so, please use jmap
to grab a thread dump directly from the JVM. Also, of course, if there are any messages in the logs from the time of the problem then they would be helpful.
When you say "nodes leave the cluster" this sounds very bad, but I can't yet see how that is related. We definitely need to see logs from at least (a) the elected master and (b) the node that left for a few minutes either side of its departure. Look for a log message from the MasterService
containing the string node-left
, and a log message from the ClusterApplierService
saying master node changed
.