ES 5 upgrade created Received response for a request that has timed out

alvaroca1 · November 3, 2016, 4:23pm

I recently upgraded from ES2.4.1 to ES 5.0 shortly after i started to notice that nodes where dropping from the cluster and or having erratic behavior. in all my nodes i see the below erros shortly after the nodes start up

2016-11-03T10:10:44,929][WARN ][o.e.a.a.c.n.i.TransportNodesInfoAction] [essep01] ignoring unexpected response [null] of type [null], expected [NodeInfo] or [FailedNodeException]
[2016-11-03T10:11:03,883][WARN ][o.e.t.TransportService ] [essep01] Received response for a request that has timed out, sent [36539ms] ago, timed out [6539ms] ago, action [internal:discovery/zen/fd/master_ping], node [{essep03}{XW_QxbVpRwSir7QEIyTJJg}{PF7FuFwzTSqqNQ7QUt-wyQ}{192.168.10.88}{192.168.10.88:9300}], id [12485911]
[2016-11-03T10:11:31,813][WARN ][o.e.t.TransportService ] [essep01] Received response for a request that has timed out, sent [48886ms] ago, timed out [46886ms] ago, action [cluster:monitor/nodes/info[n]], node [{essep03}{XW_QxbVpRwSir7QEIyTJJg}{PF7FuFwzTSqqNQ7QUt-wyQ}{192.168.10.88}{192.168.10.88:9300}], id [12485914]
[2016-11-03T10:12:11,777][WARN ][o.e.t.TransportService ] [essep01] Received response for a request that has timed out, sent [39118ms] ago, timed out [9118ms] ago, action [internal:discovery/zen/fd/master_ping], node [{essep03}{XW_QxbVpRwSir7QEIyTJJg}{PF7FuFwzTSqqNQ7QUt-wyQ}{192.168.10.88}{192.168.10.88:9300}], id [12486787]
[2016-11-03T10:13:17,141][WARN ][o.e.t.TransportService ] [essep01] Received response for a request that has timed out, sent [36637ms] ago, timed out [6636ms] ago, action [internal:discovery/zen/fd/master_ping], node [{essep03}{XW_QxbVpRwSir7QEIyTJJg}{PF7FuFwzTSqqNQ7QUt-wyQ}{192.168.10.88}{192.168.10.88:9300}], id [12487505]
[2016-11-03T10:14:24,595][WARN ][o.e.t.TransportService ] [essep01] Received response for a request that has timed out, sent [38975ms] ago, timed out [8975ms] ago, action [internal:discovery/zen/fd/master_ping], node [{essep03}{XW_QxbVpRwSir7QEIyTJJg}{PF7FuFwzTSqqNQ7QUt-wyQ}{192.168.10.88}{192.168.10.88:9300}], id [12488702]
[2016-11-03T10:15:30,329][WARN ][o.e.t.TransportService ] [essep01] Received response for a request that has timed out, sent [39101ms] ago, timed out [9101ms] ago, action [internal:discovery/zen/fd/master_ping], node [{essep03}{XW_QxbVpRwSir7QEIyTJJg}{PF7FuFwzTSqqNQ7QUt-wyQ}{192.168.10.88}{192.168.10.88:9300}], id [12489606]
[2016-11-03T10:15:46,948][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [essep01] failed to execute on node [XW_QxbVpRwSir7QEIyTJJg]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [essep03][192.168.10.88:9300][cluster:monitor/nodes/info[n]] request_id [12490115] timed out after [2001ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:840) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:444) [elasticsearch-5.0.0.jar:5.0.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_73]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_73]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_73]Preformatted text

Please help

alvaroca1 · November 3, 2016, 5:46pm

i notice that only one node in the cluster has this behavior.

[2016-11-03T13:37:03,284][INFO ][o.e.n.Node ] [essep03] starting ...
[2016-11-03T13:37:03,408][INFO ][o.e.t.TransportService ] [essep03] publish_address {192.168.10.88:9300}, bound_addresses {192.168.10.88:9300}, {[::1]:9300}, {127.0.0.1:9300}
[2016-11-03T13:37:03,412][INFO ][o.e.b.BootstrapCheck ] [essep03] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2016-11-03T13:37:33,430][WARN ][o.e.n.Node ] [essep03] timed out while waiting for initial discovery state - timeout: 30s
[2016-11-03T13:37:33,446][INFO ][o.e.h.HttpServer ] [essep03] publish_address {192.168.10.88:9200}, bound_addresses {192.168.10.88:9200}, {[::1]:9200}, {127.0.0.1:9200}
[2016-11-03T13:37:33,446][INFO ][o.e.n.Node ] [essep03] started
[2016-11-03T13:38:06,478][INFO ][o.e.d.z.ZenDiscovery ] [essep03] failed to send join request to master [{essep02}{87WsenQBTLiL7Hjd-dWXUA}{fAQLvpE4S9CKDYbrYsY7aw}{192.168.10.81}{192.168.10.81:9300}], reason [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.]; ]
[2016-11-03T13:39:09,488][INFO ][o.e.d.z.ZenDiscovery ] [essep03] failed to send join request to master [{essep02}{87WsenQBTLiL7Hjd-dWXUA}{fAQLvpE4S9CKDYbrYsY7aw}{192.168.10.81}{192.168.10.81:9300}], reason [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.];

spinscale · November 7, 2016, 8:20am

Hey,

is this behaviour happening all the time or only once? Does this node join the cluster at all? Can you ping the other elasticsearch nodes from that erratic one and response times are good?

--Alex

alvaroca1 · November 7, 2016, 2:15pm

actually the cluster healed it self, I just had to let all the shard relocate and then join the node... weir behavior

Sp4rKy · November 16, 2016, 9:18am

I get the exact same behavior in 5.0.0 and 5.0.1, whenever I restart a node, I get this in the logs :

[2016-11-16T10:02:03,895][INFO ][o.e.n.Node               ] [es-log2-data1] started
[2016-11-16T10:02:36,926][INFO ][o.e.d.z.ZenDiscovery     ] [es-log2-data1] failed to send join request to master [{es-log3-data1}{pdNDaSyHQLeYB8aH2F0buw}{_Mps7c9_Sj6s3JQEzgBFUA}{10.21.1.143}{10.21.1.143:9300}], reason [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested:     TimeoutException[Timeout waiting for task.]; ]
[2016-11-16T10:03:39,939][INFO ][o.e.d.z.ZenDiscovery     ] [es-log2-data1] failed to send join request to master [{es-log3-data1}{pdNDaSyHQLeYB8aH2F0buw}{_Mps7c9_Sj6s3JQEzgBFUA}{10.21.1.143}{10.21.1.143:9300}], reason [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.]; ]
[2016-11-16T10:04:42,953][INFO ][o.e.d.z.ZenDiscovery     ] [es-log2-data1] failed to send join request to master [{es-log3-data1}{pdNDaSyHQLeYB8aH2F0buw}{_Mps7c9_Sj6s3JQEzgBFUA}{10.21.1.143}{10.21.1.143:9300}], reason [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.]; ]
[2016-11-16T10:05:45,968][INFO ][o.e.d.z.ZenDiscovery     ] [es-log2-data1] failed to send join request to master [{es-log3-data1}{pdNDaSyHQLeYB8aH2F0buw}{_Mps7c9_Sj6s3JQEzgBFUA}{10.21.1.143}{10.21.1.143:9300}], reason [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.]; ]
[2016-11-16T10:06:07,251][INFO ][o.e.d.z.ZenDiscovery     ] [es-log2-data1] failed to send join request to master [{es-log3-data1}{pdNDaSyHQLeYB8aH2F0buw}{_Mps7c9_Sj6s3JQEzgBFUA}{10.21.1.143}{10.21.1.143:9300}], reason [NodeDisconnectedException[[es-log3-data1][10.21.1.143:9300][internal:discovery/zen/join] disconnected]]
[2016-11-16T10:06:13,166][INFO ][o.e.c.s.ClusterService   ] [es-log2-data1] detected_master {web4-master}

Not sure if its related, but I had a watch on _cluster/health, and seems the request stuck at the time my cluster node joined (I just restarted my watch and was fine).

Also, the node joined right after the cluster state switched from red to yellow.

Topic		Replies	Views
Received response for a request that has timed out Elasticsearch	1	1717	February 6, 2020
[o.e.t.TransportService] Received response for a request that has timed out Elasticsearch	17	681	August 28, 2023
ES 5.4.0 - Collector Timed Out and Nodes Disconnected Elasticsearch	1	912	January 17, 2018
Received response for a request that has timed out and "failed to retrieve stats for node" Elasticsearch	8	717	October 26, 2023
ES Prod cluster Receive Timeout Transport Exception Elasticsearch	7	6610	July 5, 2017

ES 5 upgrade created Received response for a request that has timed out

Related Topics