Request to elasticsearch cluster hangs

loukik · September 15, 2016, 5:48am

Hi,
we have 3 node cluster(may be abc, pqr, xyz). while working some times the search request sent gets hanged and we dont get any response back.

when we check elasticsearch logs on abc(master) we get following

[NodeName-{abc}] [index_name][1] received shard failed for [index_name][1], node[N9_z7xYSSO6E4-W6DqEDVA], [R], s[INITIALIZING], unassigned_info[[reason=ALLOCATION_FAILED], at[2016-09-13T06:13:53.463Z], details[shard failure [failed recovery][RecoveryFailedException[[index_name][1]: Recovery failed from [NodeName-pqr][nzkNy894SMe9FJBNv45k1Q][pqr][inet[/20.222.146.196:9300]]{master=true} into [NodeName-{abc}][GRBa3JlKRSybCTQgu76jvQ][abc][inet[/20.222.146.221:9300]]{master=true}]; nested: RemoteTransportException[[NodeName-pqr][inet[/20.222.146.196:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[index_name][1] Phase[1] Execution failed]; nested: RecoverFilesRecoveryException[[index_name][1] Failed to transfer [299] files with total size of [119.1gb]]; nested: ReceiveTimeoutTransportException[[NodeName-{abc}][inet[/20.222.146.221:9300]][internal:index/shard/recovery/clean_files] request_id [371207311] timed out after [900000ms]]; ]]], indexUUID [FImaN7b3RriRIT55eeeJXw], reason [Failed to perform [indices:data/write/delete] on replica, message [NodeDisconnectedException[[NodeName-{xyz}][inet[/20.222.146.220:9300]][indices:data/write/delete[r]] disconnected]]]
[2016-09-13 01:26:54,256][WARN ][cluster.action.shard ] [NodeName-{abc}] [index_name][2] received shard failed for [index_name][2], node[N9_z7xYSSO6E4-W6DqEDVA], [R], s[STARTED], indexUUID [FImaN7b3RriRIT55eeeJXw], reason [Failed to perform [indices:data/write/index] on replica, message [SendRequestTransportException[[NodeName-{xyz}][inet[/20.222.146.220:9300]][indices:data/write/index[r]]]; nested: NodeNotConnectedException[[NodeName-{xyz}][inet[/20.222.146.220:9300]] Node not connected]; ]]

and on node node pqr we see following logs

[action.admin.cluster.health] [NodeName-pqr] connection exception while trying to forward request to master node [[NodeName-{abc}][GRBa3JlKRSybCTQgu76jvQ][abc][inet[/20.222.146.221:9300]]{master=true}], scheduling a retry. Error: [org.elasticsearch.transport.NodeDisconnectedException: [NodeName-{abc}][inet[/20.222.146.221:9300]][cluster:monitor/health] disconnected]
[2016-09-13 01:14:26,234][WARN ][search.action ] [NodeName-pqr] Failed to send release search context
org.elasticsearch.transport.SendRequestTransportException: [NodeName-{abc}][inet[/20.222.146.221:9300]][indices:data/read/search[free_context]]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:286)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:249)
at org.elasticsearch.search.action.SearchServiceTransportAction.sendFreeContext(SearchServiceTransportAction.java:143)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.sendReleaseSearchContext(TransportSearchTypeAction.java:353)
at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction$1.onFailure(TransportSearchDfsQueryThenFetchAction.java:123)
at org.elasticsearch.search.action.SearchServiceTransportAction$8.handleException(SearchServiceTransportAction.java:283)
at org.elasticsearch.transport.TransportService$3.run(TransportService.java:290)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.NodeNotConnectedException: [NodeName-{abc}][inet[/20.222.146.221:9300]] Node not connected
at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:964)
at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:656)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:276)
... 9 more

can somebody help us out as to why this must be happening, there are no time outs, the request just hangs

Topic		Replies	Views
ES 5.1.1 node stuck in endless loop halting the whole cluster Elasticsearch	6	1889	February 14, 2017
Cannot post new document to elasticsearch Elasticsearch	6	660	July 6, 2017
Cluster hangs for 1h. no logs, no throughput Elasticsearch	7	1273	July 24, 2017
Cluster Hangs for 20 seconds, on a single node crush Elasticsearch	13	895	October 3, 2019
Unexpeted and not logged hanging Elasticsearch	5	1227	July 6, 2017

Request to elasticsearch cluster hangs

Related topics