TransportClient stuck until disconnecting from node

I am working with elastic 5.4.3 and indexing events via transport client.
I have a cluster of 3 nodes with 1 primary shard and 2 replicas.
During a test, node-2 was shutdown.
For a very specific time, when indexing a document in elastic via transport client, this line didn't return for a very long time (15 minutes): builder.execute().actionGet();
Other threads also executed this line of code and got a response successfully.
I can see that after 15 minutes, the thread got a response, right after the client wrote to the log:

DEBUG 2019-06-20 21:45:39,721 [elasticsearch[_client_][generic][T#4]] : Netty4Transport(TcpTransport.closeAndNotify:605) - disconnecting from [{node-2}{jfhMi92vThOz0x801XnniA}{1fN_YK1oR1KOyC-swbLiNw}{node
-2}{}], IOException[Connection timed out]

The node sampler writes to the log every 5 seconds:

DEBUG 2019-06-23 13:58:34,240 [elasticsearch[_client_][generic][T#1]] : TransportClientNodesService(TransportClientNodesService$SimpleNodeSampler.doSample:432) - failed to connect to node [{#transport#-2}{kBEaZTXMTyCodE1w9RR_qg}{node-2}{}], ignoring...

I don't use a sniffer sampler and I set a timeout on the bulkRequestBuilder as well as waitingForActiveShards.
I wait for 2 shards while there are 2 active shards since only 1 node turned off.
In addition, just want to emphasize that other threads do get answers at around that time, so maybe a corner case?

How can I configure some kind of timeout for this kind of situations? or configure my transport client to handle this without getting stuck?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.