ES 5.1 - Shard recovery stuck in INIT- IndexShardRelocatedException: Already relocated

Ankit_Malpani · October 5, 2017, 1:35am

So I had a shard peer recovery stuck on ES 5.1 for 16 hours.

_recovery APIs showed this stuck on INIT state
shards themselves were small ~3 MB
Had plenty free space on source and destination nodes
I noticed that both source and destination nodes invovled in the peer recovery had the segment/lucene files which shows part of recovery process did succeed
On enabling TRACE I am pretty sure this is a bug as after enabling TRACE logs, i could figure that the source node trying to relocate its shard to a destination was constantly trying and getting error of the form

[2017-10-04T17:45:47,586][TRACE][o.e.i.r.PeerRecoveryTargetService] [wb_dtDf] [logs1][3] Got exception on recovery
org.elasticsearch.transport.RemoteTransportException: [qx_Ng9r][x.x.x.x:9300][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.shard.IndexShardRelocatedException: CurrentState[RELOCATED] Already relocated
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:162) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:119) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$100(PeerRecoverySourceService.java:54) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:128) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:125) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1385) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:527) [elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.1.1.jar:5.1.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_112]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_112]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]

Based on this , I realized maybe deleting the partially copied index on the destination will get the recovery to move forward which did fix my problem.

However, I think this is a bug and the recovery code should have internally tried to delete the copied contents of the shard on destination node after getting 'IndexShardRelocatedException: CurrentState[RELOCATED] Already relocated' almost every minute for 16 hours worth of attempts.

warkolm · October 6, 2017, 4:43am

Please raise this on Github so we can take a closer look - Issues · elastic/elasticsearch · GitHub

system · November 3, 2017, 4:43am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CurrentState[RECOVERING] operations only allowed when started/relocated \| action.search.type \| All shards failed for phase: [dfs] Elasticsearch	1	3156	July 5, 2017
Failed Shard Recovery Elasticsearch	5	3183	July 6, 2017
Shard initialization stuck - RecoveryFailedException Elasticsearch	5	2867	April 2, 2019
Shard initialization stuck on Cluster recovering - SameShardAllocationDecider Elasticsearch	5	5911	May 2, 2019
Encountering Index shard gateway recovery exception while manually moving the shards across nodes Elasticsearch	3	328	July 6, 2017

ES 5.1 - Shard recovery stuck in INIT- IndexShardRelocatedException: Already relocated

Related topics