Error when using shadow replicas

pweaver · May 7, 2015, 8:59pm

Continuing the discussion from Avoiding duplicate data and work when using a shared filesystem:

I am using elasticsearch-1.5.0.

I am following the article at http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-shadow-replicas.html. I am able to create an index with shadow_replica true and number_of_replicas 0. However, as soon as I increase number_of_replicas to any non-zero number, then I get a flood of this in the logs:

[2015-05-07 13:53:29,358][WARN ][cluster.action.shard     ] [lindevelastic1] [test-2015.04.27][0] sending failed shard for [test-2015.04.27][0], node[At9q_V1rQVqXxT8m4EwVPg], [R], s[INITIALIZING], indexUUID [fXq4xQuYQkCEcKyGTJ3ZfA], reason [Failed to start shard, message [RecoveryFailedException[[test-2015.04.27][0]: Recovery failed from [lindevelastic2][LlGB__lyRCOL7cfWIZCE5g][lindevelastic2.vw.rentrak.com][inet[lindevelastic2.vw.rentrak.com/172.26.21.63:9300]]{enable_custom_paths=true, master=false} into [lindevelastic1][At9q_V1rQVqXxT8m4EwVPg][lindevelastic1.vw.rentrak.com][inet[lindevelastic1.vw.rentrak.com/172.26.21.62:9300]]{enable_custom_paths=true, master=true}]; nested: RemoteTransportException[[lindevelastic2][inet[/172.26.21.63:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[test-2015.04.27][0] Phase[2] Execution failed]; nested: RemoteTransportException[[lindevelastic1][inet[/172.26.21.62:9300]][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[[test-2015.04.27][0] failed to open index reader]; nested: IndexNotFoundException[no segments* file found in store(least_used[rate_limited(default(mmapfs(/mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/0/test-2015.04.27/0/index),niofs(/mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/0/test-2015.04.27/0/index)), type=MERGE, rate=20.0)]): files: []]; ]]
[2015-05-07 13:53:29,358][WARN ][cluster.action.shard     ] [lindevelastic1] [test-2015.04.27][0] received shard failed for [test-2015.04.27][0], node[At9q_V1rQVqXxT8m4EwVPg], [R], s[INITIALIZING], indexUUID [fXq4xQuYQkCEcKyGTJ3ZfA], reason [Failed to start shard, message [RecoveryFailedException[[test-2015.04.27][0]: Recovery failed from [lindevelastic2][LlGB__lyRCOL7cfWIZCE5g][lindevelastic2.vw.rentrak.com][inet[lindevelastic2.vw.rentrak.com/172.26.21.63:9300]]{enable_custom_paths=true, master=false} into [lindevelastic1][At9q_V1rQVqXxT8m4EwVPg][lindevelastic1.vw.rentrak.com][inet[lindevelastic1.vw.rentrak.com/172.26.21.62:9300]]{enable_custom_paths=true, master=true}]; nested: RemoteTransportException[[lindevelastic2][inet[/172.26.21.63:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[test-2015.04.27][0] Phase[2] Execution failed]; nested: RemoteTransportException[[lindevelastic1][inet[/172.26.21.62:9300]][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[[test-2015.04.27][0] failed to open index reader]; nested: IndexNotFoundException[no segments* file found in store(least_used[rate_limited(default(mmapfs(/mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/0/test-2015.04.27/0/index),niofs(/mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/0/test-2015.04.27/0/index)), type=MERGE, rate=20.0)]): files: []]; ]]
[2015-05-07 13:53:29,396][WARN ][cluster.action.shard     ] [lindevelastic1] [test-2015.04.27][0] received shard failed for [test-2015.04.27][0], node[akGDHoQnQAqBWS3SbOhMoQ], [R], s[INITIALIZING], indexUUID [fXq4xQuYQkCEcKyGTJ3ZfA], reason [Failed to start shard, message [RecoveryFailedException[[test-2015.04.27][0]: Recovery failed from [lindevelastic2][LlGB__lyRCOL7cfWIZCE5g][lindevelastic2.vw.rentrak.com][inet[/172.26.21.63:9300]]{enable_custom_paths=true, master=false} into [lindevelastic4][akGDHoQnQAqBWS3SbOhMoQ][lindevelastic4.vw.rentrak.com][inet[lindevelastic4.vw.rentrak.com/172.26.21.65:9300]]{enable_custom_paths=true, master=false}]; nested: RemoteTransportException[[lindevelastic2][inet[/172.26.21.63:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[test-2015.04.27][0] Phase[2] Execution failed]; nested: RemoteTransportException[[lindevelastic4][inet[/172.26.21.65:9300]][internal:index/shard/recovery/prepare_translog]]; nested: EngineCreationFailureException[[test-2015.04.27][0] failed to open index reader]; nested: IndexNotFoundException[no segments* file found in store(least_used[rate_limited(default(mmapfs(/mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/3/test-2015.04.27/0/index),niofs(/mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/3/test-2015.04.27/0/index)), type=MERGE, rate=20.0)]): files: []]; ]]
[2015-05-07 13:53:29,472][WARN ][index.engine             ] [lindevelastic1] [test-2015.04.27][0] failed to create new reader
org.apache.lucene.index.IndexNotFoundException: no segments* file found in store(least_used[rate_limited(default(mmapfs(/mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/0/test-2015.04.27/0/index),niofs(/mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/0/test-2015.04.27/0/index)), type=MERGE, rate=20.0)]): files: []

And it keeps repeating those errors until I delete the index.

dakrone · May 8, 2015, 5:33pm

Hi pweaver,

A couple of questions:

Is /mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/ available on every node in the cluster? If you manually go to the /mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/0/test-2015.04.27/0/index directory, can you see a segments_* file there? What is the listing of this directory on the second node (the one with the replica)?

It sounds like the NFS replication may not be replicating the files that are created on the primary version of the shard?

It looks like you are using NFS, what version of NFS are you using?

Currently, Lucene still does not work well with NFS in general due to some file behavior implementation trade-offs that NFS makes. I believe it is better with NFSv4 but still not entirely fixed.

pweaver · May 8, 2015, 9:59pm

Yes, the base directory (/mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/) is available on each node.

When I first create the index (with no replicas), then a directory gets created with the segments in it. For example, if node 4 gets the primary shard, then I see this:

ls /mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/4/test/0/index/

segments_1  segments.gen  write.lock

If I then increase number_of_replicas to 1, the errors start showing up in the logs, and this is what I see in the directory of another node, which was supposed to get the replica:

ls /mnt/nfs/vol_na3_lindevelastic_nfs/shared/elasticsearch/shadow-indices/2/test/0/index/

The directory is empty. Of course, the point of shadow replicas is that the data shouldn't be duplicated, right?

Perhaps I should set node.add_id_to_custom_path to false so that they use the exact same directory?

dakrone · May 8, 2015, 10:29pm

Yes, you should do this if you are running nodes on the same machine.

pweaver · May 8, 2015, 10:42pm

Or if I'm sharing an NFS mount across multiple machines, right?

Topic		Replies	Views
Shadow replica's issue Elasticsearch	1	373	July 5, 2017
Exception when re-balancing shadow replica index shards - unable to determine if root directory exists Elasticsearch	3	2080	July 5, 2017
Failed to perform indices/index/shard/index on replica Index Shard Elasticsearch	4	1293	July 6, 2017
Get the warning received shard failed for certain replicas Elasticsearch	2	1352	July 6, 2017
About org.elasticsearch.indices.recovery.RecoveryFailedException error Elasticsearch	1	1578	July 6, 2017

Error when using shadow replicas

Related topics