Preventing SearchPhaseExecutionExceptions during rolling restart of cluster nodes

Several Scroll Search requests fail with SearchPhaseExecutionException around the time when there is a rolling restart of the nodes in the cluster(ES 6.8). These SearchPhaseExecution exception has an underlying cause that points to disconnect between the nodes within cluster(due to rolling restart). The root cause varies from IllegalStateException("node xxx not available"[1]), NodeNotConnectedException, NodeDisconnectedException.

I would expect the transport client to automatically retry another node while the current node is down. My understanding is that these exception are not retried because they are occurring in the fetch phase and only option is for client to retry the search query. Is this correct? What is the recommended way to handle these exception?

[1]elasticsearch/SearchScrollAsyncAction.java at 6.8 · elastic/elasticsearch · GitHub

That sounds correct, yes.

I think you will need to try again yourself. The client will not retry automatically.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.