Node disconnects during merges on master?

Tim_J · June 21, 2012, 4:48pm

Hey folks,
I've recently been having trouble with a 20 node ES cluster where every
other night one or more nodes will drop out of the cluster due to failed
pings. Typically it's one or two nodes, always the same one or two, and
roughly at the same time of night. At time these nodes are dropped the
master seems to be doing a lot of merges and, presumably related, it has
very high disk IO (close to 100%). The nodes that are dropped also tend to
be busy with IO but not nearly as much as the master.

It's also worth noting that the master logs show the debug output "using
[concurrent] merge scheduler with max_thread_count[3]" on the nights when a
problem occurs at roughly the time of the failures. It's not clear to me
what that message indicates or why it chooses to display the message then
since presumably the scheduler has been responsible for any prior merges
that day. Is it getting reset? Is it starting some kind of larger merge?
Seems very coincidental.

So, any thoughts on what might be going wrong or how to address it? I
could just extended the failure detection ping timeouts but that seems like
it's hiding the symptom without tracking down the cause.

Thanks,
-Tim

Topic		Replies	Views
Nodes continuously leaving and rejoining the cluster in 7.1 cluster after master switch Elasticsearch	8	1979	October 15, 2020
Cluter removed timeout Coordinating node Elasticsearch	2	227	February 13, 2023
ES nodes disconnects intermittently from the cluster Elasticsearch	1	628	February 8, 2018
Nodes leaves cluster and rejoin after sometime Elasticsearch	3	664	March 20, 2018
Master node hangs when multiple data nodes are shutdown at the same time Elasticsearch	6	954	July 6, 2017

Node disconnects during merges on master?

Related topics