Error when using shadow replicas

Continuing the discussion from Avoiding duplicate data and work when using a shared filesystem:

I am using elasticsearch-1.5.0.

I am following the article at http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-shadow-replicas.html. I am able to create an index with shadow_replica true and number_of_replicas 0. However, as soon as I increase number_of_replicas to any non-zero number, then I get a flood of this in the logs:

[2015-05-07 13:53:29,358][WARN ][cluster.action.shard     ] [lindevelastic1] [test-2015.04.27][0] sending failed shard for [test-2015.04.27][0], node[At9q_V1rQVqXxT8m4EwVPg], [R], s[INITIALIZING], indexUUID [fXq4xQuYQkCEcKyGTJ3ZfA], reason [Failed to start shard, message [RecoveryFailedException[[test-2015.04.27][0]: Recovery failed from [lindevelastic2][LlGB__lyRCOL7cfWIZCE5g][lindevelastic2.vw.rentrak.com][inet[lindevelastic2.vw.rentrak.com/172.26.21.63:9300]]{enable_custom_paths=true, master=false} into [lindevelastic1][At9q_V1rQVqXxT8m4EwVPg][lindevelastic1.vw.rentrak.com][inet[lindevelastic1.vw.rentrak.com/172.26.21.62:9300]]{enable_custom_paths=true, master=true}]; nested: RemoteTransportException[[lindevelastic2][inet[/172.26.21.63:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[test-2015.04.27][0] Phase[2] Execution failed]; nested: RemoteTransportException[[lindevelastic1][inet[/172.26.21.62:9300]][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[[test-2015.04.27][0] failed to open index reader]; nested: IndexNotFoundException[no segments* file found in store(least_used[rate_limited(default(mmapfs(/mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/0/test-2015.04.27/0/index),niofs(/mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/0/test-2015.04.27/0/index)), type=MERGE, rate=20.0)]): files: []]; ]]
[2015-05-07 13:53:29,358][WARN ][cluster.action.shard     ] [lindevelastic1] [test-2015.04.27][0] received shard failed for [test-2015.04.27][0], node[At9q_V1rQVqXxT8m4EwVPg], [R], s[INITIALIZING], indexUUID [fXq4xQuYQkCEcKyGTJ3ZfA], reason [Failed to start shard, message [RecoveryFailedException[[test-2015.04.27][0]: Recovery failed from [lindevelastic2][LlGB__lyRCOL7cfWIZCE5g][lindevelastic2.vw.rentrak.com][inet[lindevelastic2.vw.rentrak.com/172.26.21.63:9300]]{enable_custom_paths=true, master=false} into [lindevelastic1][At9q_V1rQVqXxT8m4EwVPg][lindevelastic1.vw.rentrak.com][inet[lindevelastic1.vw.rentrak.com/172.26.21.62:9300]]{enable_custom_paths=true, master=true}]; nested: RemoteTransportException[[lindevelastic2][inet[/172.26.21.63:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[test-2015.04.27][0] Phase[2] Execution failed]; nested: RemoteTransportException[[lindevelastic1][inet[/172.26.21.62:9300]][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[[test-2015.04.27][0] failed to open index reader]; nested: IndexNotFoundException[no segments* file found in store(least_used[rate_limited(default(mmapfs(/mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/0/test-2015.04.27/0/index),niofs(/mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/0/test-2015.04.27/0/index)), type=MERGE, rate=20.0)]): files: []]; ]]
[2015-05-07 13:53:29,396][WARN ][cluster.action.shard     ] [lindevelastic1] [test-2015.04.27][0] received shard failed for [test-2015.04.27][0], node[akGDHoQnQAqBWS3SbOhMoQ], [R], s[INITIALIZING], indexUUID [fXq4xQuYQkCEcKyGTJ3ZfA], reason [Failed to start shard, message [RecoveryFailedException[[test-2015.04.27][0]: Recovery failed from [lindevelastic2][LlGB__lyRCOL7cfWIZCE5g][lindevelastic2.vw.rentrak.com][inet[/172.26.21.63:9300]]{enable_custom_paths=true, master=false} into [lindevelastic4][akGDHoQnQAqBWS3SbOhMoQ][lindevelastic4.vw.rentrak.com][inet[lindevelastic4.vw.rentrak.com/172.26.21.65:9300]]{enable_custom_paths=true, master=false}]; nested: RemoteTransportException[[lindevelastic2][inet[/172.26.21.63:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[test-2015.04.27][0] Phase[2] Execution failed]; nested: RemoteTransportException[[lindevelastic4][inet[/172.26.21.65:9300]][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[[test-2015.04.27][0] failed to open index reader]; nested: IndexNotFoundException[no segments* file found in store(least_used[rate_limited(default(mmapfs(/mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/3/test-2015.04.27/0/index),niofs(/mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/3/test-2015.04.27/0/index)), type=MERGE, rate=20.0)]): files: []]; ]]
[2015-05-07 13:53:29,472][WARN ][index.engine             ] [lindevelastic1] [test-2015.04.27][0] failed to create new reader
org.apache.lucene.index.IndexNotFoundException: no segments* file found in store(least_used[rate_limited(default(mmapfs(/mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/0/test-2015.04.27/0/index),niofs(/mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/0/test-2015.04.27/0/index)), type=MERGE, rate=20.0)]): files: []

And it keeps repeating those errors until I delete the index.

Hi pweaver,

A couple of questions:

  • Is /mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/ available on every node in the cluster? If you manually go to the /mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/0/test-2015.04.27/0/index directory, can you see a segments_* file there? What is the listing of this directory on the second node (the one with the replica)?

It sounds like the NFS replication may not be replicating the files that are created on the primary version of the shard?

  • It looks like you are using NFS, what version of NFS are you using?

Currently, Lucene still does not work well with NFS in general due to some file behavior implementation trade-offs that NFS makes. I believe it is better with NFSv4 but still not entirely fixed.

Yes, the base directory (/mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/) is available on each node.

When I first create the index (with no replicas), then a directory gets created with the segments in it. For example, if node 4 gets the primary shard, then I see this:

ls /mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/4/test/0/index/

segments_1  segments.gen  write.lock

If I then increase number_of_replicas to 1, the errors start showing up in the logs, and this is what I see in the directory of another node, which was supposed to get the replica:

ls /mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/2/test/0/index/

The directory is empty. Of course, the point of shadow replicas is that the data shouldn't be duplicated, right?

Perhaps I should set node.add_id_to_custom_path to false so that they use the exact same directory?

Yes, you should do this if you are running nodes on the same machine.

Or if I'm sharing an NFS mount across multiple machines, right?