Very confused here. Just upgraded the RAM in one node in a 2 node cluster. Node was shut down cleanly and brought up cleanly after the upgrade. When the node is started, shards start reallocating from the other node. However, it gets to 23 shards remaining and stops. There's nothing I can see that is special about the shards that don't reallocate - they vary in size from about 1MB to 100GB. I have tried disabling and re-enabling cluster.routing.allocation.enable. Nothing in the Elasticsearch logs on either of the two nodes.
I have restarted the affected node a couple of times - each time it gets stuck on the same 23 shards.
Ah! I changed number_of_replicas to 0 on one of the much smaller indexes, then back to 1. Those shards are distributing now. I will do all affected indexes and see if this has the same effect.
I need to upgrade the RAM in the second node at some point so we can see if this happens again on that.
org.elasticsearch.indices.recovery.RecoveryFailedException: [logstash-api-pns-2017.38][0]: Recovery failed from {elasticsearch-02}{AKGKzJ2mQqa-26zbPlEsFw}{1RXza1r5Q8Syxx924B_6cQ}{10.20.4.145}{10.20.4.145:9300} into {elasticsearch-01}{EDv6io2GRM6d5WfaA0EKwA}{o-ljKzafTMmM4qDqYxYELw}{10.20.4.143}{10.20.4.143:9300}
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:314) [elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:73) [elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:556) [elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.5.2.jar:5.5.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: org.elasticsearch.transport.RemoteTransportException: [elasticsearch-02][10.20.4.145:9300][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:140) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:132) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$100(PeerRecoverySourceService.java:54) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:141) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:138) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544) ~[elasticsearch-5.5.2.jar:5.5.2]
... 5 more
Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [0] files with total size of [0b]
at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:337) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:138) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:132) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$100(PeerRecoverySourceService.java:54) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:141) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:138) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544) ~[elasticsearch-5.5.2.jar:5.5.2]
... 5 more
Caused by: java.lang.IllegalStateException: try to recover [logstash-api-pns-2017.38][0] from primary shard with sync id but number of docs differ: 123810358 (elasticsearch-02, primary) vs 123810384(elasticsearch-01)
at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:226) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:138) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:132) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$100(PeerRecoverySourceService.java:54) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:141) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:138) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544) ~[elasticsearch-5.5.2.jar:5.5.2]
... 5 more
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.