Shard initialization stuck - RecoveryFailedException

Hi,

INITIAL PROBLEM:

It seems that shards do not manage to finish their initialization.
I keep getting the following exception in the logs

00:00:02.100 WARN  org.elasticsearch.indices.cluster - [00848fc0-8093-4f0f-975f-256b500cb423] 
[[search-cms-repo][0]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [search-cms-repo][0]: Recovery failed from [160ab28e-2944-4800-b90e-f794b3099c5e][SQlcAKUfQ3G4Y6DcEOTWiw][f755503c733d][inet[/ip1:9300]]{local=false, master=true} into [00848fc0-8093-4f0f-975f-256b500cb423][a1ewguLMRiKrpisESiNpjQ][5f74740899fc][inet[/ip2:9300]]{local=false, master=true}
at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:274)
at org.elasticsearch.indices.recovery.RecoveryTarget.access$700(RecoveryTarget.java:69)
at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:550)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException: [160ab28e-2944-4800-b90e-f794b3099c5e][inet[/ip1:9300]][internal:index/shard/recovery/start_recovery] request_id [32348290] timed out after [5000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
... 3 common frames omitted

When I look at the size of a problematic shard on disk I can see that it keeps growing and then goes back to its initial size, every 5seconds
15

REPRODUCED LOCALLY:

I am not sure how this happened on the real server nodes (ES 1.5.2).
But I manage to reproduce a similar situation locally, consistently. The version of ES tried are 1.5.2 and 1.7.6 and both have the same issue (even though the exception message is slightly different)

  • To reproduce locally I simply have a cluster of 2 nodes (ES 1.7.6).
  • I reduced the speed of the recovery (indices.recovery.max_bytes_per_sec) to be sure that the replication of a shard takes more than 5seconds
  • I have 8 indices (I tried with 4 first and could never reproduce. Maybe because of the way the shards are spread. But 8 seems to always do the trick)
  • To force the cluster to replicate the shards, I lower the number of replicates from 1 -> 0, and then (when they are relocated properly) back to 1.

Most managed to pass to state "STARTED" except 1.
It keeps passing between the state "INITIALIZING" and "UNASSIGNED"
In the logs I keep getting

2019-03-01 14:35:34,907 WARN  org.elasticsearch.indices.cluster - [node1] [[storage-cms-repo][0]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [storage-cms-repo][0]: Recovery failed from [node2][J5iywdFIQ-ia4y5hZiXpBw][glenns-mbp.lab.enonic.net][inet[/127.0.0.1:9301]]{local=false, master=true} into [node1][kyynt3ovTEOk2WM7PsEMSw][glenns-mbp.lab.enonic.net][inet[/127.0.0.1:9300]]{local=false, master=true}
at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:280)
at org.elasticsearch.indices.recovery.RecoveryTarget.access$700(RecoveryTarget.java:70)
at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:567)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.transport.RemoteTransportException: [node2][inet[/127.0.0.1:9301]][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.transport.NotSerializableTransportException: [org.elasticsearch.indices.recovery.DelayRecoveryException] source node does not have the shard listed in its state as allocated on the node;

Here are the _cat/shards for each step

Initial situation:
04

After set replicas to 0
22

After setting back the replicas to 1
32

As you can see
Most managed to pass to state "STARTED" except 1.
It keeps passing between the state "INITIALIZING" and "UNASSIGNED"
In the logs I keep getting the warning log

I look at the size of a problematic shard on disk

If I restart the problematic node, the initialization finishes
32

1.5.2 and 1.7.6 are very old, and shard recovery has changed rather a lot since they were released. I don't know that I can even compile those versions in my development environment any more. Can you reproduce this on a supported version?

Yes, and we are currently upgrading to a newer version. I will then try again.

But meanwhile, if another person has experienced this problem and understood/solved it.
I saw multiple other mentions of this problem (example: Cluster stuck in Initializing), but I have not found an answer yet.

Copying this link here since it helped to fix the initial problem (not the local one): https://blog.trifork.com/2015/04/08/dealing-with-nodenotavailableexceptions-in-elasticsearch/
tl;dr TCP keepalive settings

+1 to that. The reference manual echoes this advice:

Elasticsearch opens a number of long-lived TCP connections between each pair of nodes in the cluster, and some of these connections may be idle for an extended period of time. Nonetheless, Elasticsearch requires these connections to remain open, and it can disrupt the operation of the cluster if any inter-node connections are closed by an external influence such as a firewall. It is important to configure your network to preserve long-lived idle connections between Elasticsearch nodes, for instance by leaving tcp_keep_alive enabled and ensuring that the keepalive interval is shorter than any timeout that might cause idle connections to be closed, or by setting transport.ping_schedule if keepalives cannot be configured.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.