We are using ES 1.5.2 with 5 x 128G memory server. Java 1.7.0_67.
The cluster now can't assigned replica shards, after a cluster recovery(this had occured before, mostly caused by a long time GC ).
The error info as below:
[2016-02-29 13:08:01,111][WARN ][indices.cluster ] [10.209.240.31] [[idx2-20160228][1]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [idx2-20160228][1]: Recovery failed from [10.209.240.11][8bDsBWlsSYGQvltZQgFrdA][CDM3E01-209240011][inet[/10.209.240.11:11300]] into [10.209.240.31][HljL5MgpQtKWCEMX0nSCSA][CDM3E02-209240031][inet[/10.209.240.31:11300]]
at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:274)
at org.elasticsearch.indices.recovery.RecoveryTarget.access$700(RecoveryTarget.java:69)
at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:550)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.RemoteTransportException: [10.209.240.11][inet[/10.209.240.11:11300]][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [idx2-20160228][1] Phase[1] Execution failed
at org.elasticsearch.index.engine.InternalEngine.recover(InternalEngine.java:842)
at org.elasticsearch.index.shard.IndexShard.recover(IndexShard.java:699)
at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:125)
at org.elasticsearch.indices.recovery.RecoverySource.access$200(RecoverySource.java:49)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:146)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:132)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:277)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: [idx2-20160228][1] Failed to transfer [118] files with total size of [3.1gb]
at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:413)
at org.elasticsearch.index.engine.InternalEngine.recover(InternalEngine.java:837)
... 10 more
Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException: [10.209.240.31][inet[/10.209.240.31:11300]][internal:index/shard/recovery/file_chunk] request_id [649263751] timed out after [60ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
... 3 more
Suppressed: org.elasticsearch.transport.ReceiveTimeoutTransportException: [10.209.240.31][inet[/10.209.240.31:11300]][internal:index/shard/recovery/file_chunk] request_id [649263739] timed out after [60ms]
... 4 more
Suppressed: org.elasticsearch.transport.ReceiveTimeoutTransportException: [10.209.240.31][inet[/10.209.240.31:11300]][internal:index/shard/recovery/file_chunk] request_id [649263731] timed out after [60ms]
... 4 more
Suppressed: org.elasticsearch.transport.ReceiveTimeoutTransportException: [10.209.240.31][inet[/10.209.240.31:11300]][internal:index/shard/recovery/file_chunk] request_id [649263727] timed out after [60ms]
How should we do for this?
Thanks!