Hi,
I am using ElasticSearch 2.4.0 and have 9 cluster node(3 Master, 4 Data and 2 Client) . I am getting the Transport response handler not found for id message in almost all the data nodes so frequently. After that shards are going to unassigned state
After that it is taking lot of time for the Shared recovery. Not sure what is happening here. Please let me know, what is the cause of the Transport issue and how to fix it
[2016-11-20 18:02:42,484][WARN ][transport ] [ITTESPROD-DATA3] Transport response handler not found of id [20261257]
[2016-11-20 18:02:42,500][WARN ][transport ] [ITTESPROD-DATA3] Transport response handler not found of id [20261254]
[2016-11-20 18:02:42,510][WARN ][transport ] [ITTESPROD-DATA3] Transport response handler not found of id [20261256]
[2016-11-20 18:02:42,522][WARN ][transport ] [ITTESPROD-DATA3] Transport response handler not found of id [20261258]
[2016-11-20 18:02:42,538][WARN ][transport ] [ITTESPROD-DATA3] Transport response handler not found of id [20261260]
[2016-11-20 18:02:42,553][WARN ][transport ] [ITTESPROD-DATA3] Transport response handler not found of id [20261259]
[2016-11-20 18:02:42,581][WARN ][transport ] [ITTESPROD-DATA3] Transport response handler not found of id [20261261]
[2016-11-20 18:02:42,581][WARN ][transport ] [ITTESPROD-DATA3] Transport response handler not found of id [20261262]
[2016-11-20 18:02:42,581][WARN ][transport ] [ITTESPROD-DATA3] Transport response handler not found of id [20261263]
[2016-11-20 18:02:42,602][WARN ][transport ] [ITTESPROD-DATA3] Transport response handler not found of id [20261264]
[[tracemessages][3]] marking and sending shard failed due to [failed recovery]
RecoveryFailedException[[tracemessages][3]: Recovery failed from {ITTESPROD-DATA1}{gxdK8boTQ4iR6blxwdsh3Q}{10.158.36.211}{10.158.36.211:9300}{master=false} into {ITTESPROD-DATA3}{gb6fWppVTKSXJKWo4SwrUw}{10.158.36.204}{10.158.36.204:9300}{master=false} (no activity after [30m])]; nested: ElasticsearchTimeoutException[no activity after [30m]];
at org.elasticsearch.indices.recovery.RecoveriesCollection$RecoveryMonitor.doRun(RecoveriesCollection.java:235)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: ElasticsearchTimeoutException[no activity after [30m]]
... 5 more
[2016-12-15 12:36:44,566][WARN ][indices.cluster ] [ITTESPROD-DATA3] [[tracemessages][1]] marking and sending shard failed due to [failed recovery]
RecoveryFailedException[[tracemessages][1]: Recovery failed from {ITTESPROD-DATA1}{gxdK8boTQ4iR6blxwdsh3Q}{10.158.36.211}{10.158.36.211:9300}{master=false} into {ITTESPROD-DATA3}{gb6fWppVTKSXJKWo4SwrUw}{10.158.36.204}{10.158.36.204:9300}{master=false} (no activity after [30m])]; nested: ElasticsearchTimeoutException[no activity after [30m]];
at org.elasticsearch.indices.recovery.RecoveriesCollection$RecoveryMonitor.doRun(RecoveriesCollection.java:235)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: ElasticsearchTimeoutException[no activity after [30m]]
... 5 more