Several Scroll Search requests fail with SearchPhaseExecutionException around the time when there is a rolling restart of the nodes in the cluster(ES 6.8). These SearchPhaseExecution exception has an underlying cause that points to disconnect between the nodes within cluster(due to rolling restart). The root cause varies from IllegalStateException("node xxx not available"[1]), NodeNotConnectedException, NodeDisconnectedException.
I would expect the transport client to automatically retry another node while the current node is down. My understanding is that these exception are not retried because they are occurring in the fetch phase and only option is for client to retry the search query. Is this correct? What is the recommended way to handle these exception?
[1]elasticsearch/SearchScrollAsyncAction.java at 6.8 · elastic/elasticsearch · GitHub