Preventing SearchPhaseExecutionExceptions during rolling restart of cluster nodes

amishar · February 23, 2021, 9:21am

Several Scroll Search requests fail with SearchPhaseExecutionException around the time when there is a rolling restart of the nodes in the cluster(ES 6.8). These SearchPhaseExecution exception has an underlying cause that points to disconnect between the nodes within cluster(due to rolling restart). The root cause varies from IllegalStateException("node xxx not available"[1]), NodeNotConnectedException, NodeDisconnectedException.

I would expect the transport client to automatically retry another node while the current node is down. My understanding is that these exception are not retried because they are occurring in the fetch phase and only option is for client to retry the search query. Is this correct? What is the recommended way to handle these exception?

[1]elasticsearch/SearchScrollAsyncAction.java at 6.8 · elastic/elasticsearch · GitHub

DavidTurner · February 23, 2021, 12:06pm

That sounds correct, yes.

I think you will need to try again yourself. The client will not retry automatically.

system · March 23, 2021, 12:06pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
2 Nodes ES cluster becomes unavailable for 2 -3 mins if one node (master) goes down Elasticsearch	11	3674	July 5, 2017
Errors during node restart Elasticsearch	3	298	July 6, 2017
SearchPhaseExecutionException on wild loop Elasticsearch	16	367	July 6, 2017
Shard failure after restart of node - ES 1.7.5 Elasticsearch	7	671	July 5, 2017
Scroll search got a no node available exception Elasticsearch	3	1065	July 6, 2017

Preventing SearchPhaseExecutionExceptions during rolling restart of cluster nodes

Related topics