Try to recover [test-20181128][2] from primary shard with sync id but number of docs differ: 59432 (10.1.1.189, primary) vs 60034(10.1.1.190)

We use three ES data nodes with setting -Xmx30g -Xms30g. The three ES servers have 128G physical memory and 32 CPU cores.

the ES version is 5.4.1.

The following exception was found in the log today:
Caused by: java.lang.IllegalStateException: try to recover [test-20181128][2] from primary shard with sync id but number of docs differ: 59432 (10.1.1.189, primary) vs 60034(10.1.1.190) at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:226) ~[elasticsearch-5.4.1.jar:5.4.1] at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:138) ~[elasticsearch-5.4.1.jar:5.4.1] at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:132) ~[elasticsearch-5.4.1.jar:5.4.1] at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$100(PeerRecoverySourceService.java:54) ~[elasticsearch-5.4.1.jar:5.4.1] at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:141) ~[elasticsearch-5.4.1.jar:5.4.1] at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:138) ~[elasticsearch-5.4.1.jar:5.4.1] at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[elasticsearch-5.4.1.jar:5.4.1] at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.4.1.jar:5.4.1] at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1528) ~[elasticsearch-5.4.1.jar:5.4.1] ... 5 more

i do not understand why the number of documents to be shard is less than the number of copies?

These symptoms could be explained by any of these three issues, all fixed in 6.3.0. In the meantime you can recover this index by rebuilding its replicas: set number_of_replicas to 0, wait for the replicas to be deleted, and then set it back to its current value to create them again.



This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.