Delete Unassigned replica shards

elk2 · May 27, 2019, 9:56am

After a cluster crash for filesystem corruption , I restarted the 5 nodes elasticsearch.

Actually I have 30 Unassigned replica shards that elasticsearch try to reassign without success.
I want to delete them without lost the others primary shards.
It's possible to delete only the replica unassigned shards? What happens if I delete them?

These are logs of my old elasticsearch 1.3

[2019-05-27 11:15:53,058][WARN ][cluster.action.shard     ] [Node4] [.marvel-2019.05.20][0] sending failed shard for [.marvel-2019.05.20][0], node[UXetSPnwQIuchTbBlonOrA], [P], s[INITIALIZING], indexUUID [LAWodau7QsOPINKIAwFJDg], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[.marvel-2019.05.20][0] failed recovery]; nested: EngineCreationFailureException[[.marvel-2019.05.20][0] failed to create engine]; nested: EOFException[read past EOF: NIOFSIndexInput(path="/home/elastic/ELK/Node4/data/My-ELK/nodes/0/indices/.marvel-2019.05.20/0/index/_oap.cfs")]; ]]
[2019-05-27 11:15:53,342][WARN ][indices.cluster          ] [Node4] [logstash-2019.05.20][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [logstash-2019.05.20][3] failed to fetch index version after copying it over
        at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
        at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.lucene.index.CorruptIndexException: [logstash-2019.05.20][3] Corrupted index [corrupted_0Qn9G6RVSvqqDw401kcqNQ] caused by: CorruptIndexException[codec footer mismatch: actual footer=0 vs expected footer=-1071082520 (resource: NIOFSIndexInput(path="/home/elastic/ELK/Node4/data/My-ELK/nodes/0/indices/logstash-2019.05.20/3/index/_6ou.cfs"))]
        at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343)
        at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328)
        at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:119)
        ... 4 more
[2019-05-27 11:15:53,346][WARN ][cluster.action.shard     ] [Node4] [logstash-2019.05.20][3] sending failed shard for [logstash-2019.05.20][3], node[UXetSPnwQIuchTbBlonOrA], [P], s[INITIALIZING], indexUUID [9gVFPrm9T36C7U0-zfpT3w], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[logstash-2019.05.20][3] failed to fetch index version after copying it over]; nested: CorruptIndexException[[logstash-2019.05.20][3] Corrupted index [corrupted_0Qn9G6RVSvqqDw401kcqNQ] caused by: CorruptIndexException[codec footer mismatch: actual footer=0 vs expected footer=-1071082520 (resource: NIOFSIndexInput(path="/home/elastic/ELK/Node4/data/My-ELK/nodes/0/indices/logstash-2019.05.20/3/index/_6ou.cfs"))]]; ]]
[2019-05-27 11:15:58,371][WARN ][indices.cluster          ] [Node4] [.marvel-2019.05.20][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [.marvel-2019.05.20][0] failed recovery
        at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:185)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: [.marvel-2019.05.20][0] failed to create engine
        at org.elasticsearch.index.engine.internal.InternalEngine.start(InternalEngine.java:277)
        at org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryPrepareForTranslog(InternalIndexShard.java:714)
        at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:225)
        at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
        ... 3 more
Caused by: java.io.EOFException: read past EOF: NIOFSIndexInput(path="/home/elastic/ELK/Node4/data/My-ELK/nodes/0/indices/.marvel-2019.05.20/0/index/_oap.cfs")
        at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:336)
        at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54)
        at org.apache.lucene.store.DataInput.readVInt(DataInput.java:120)
        at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:221)
        at org.apache.lucene.store.CompoundFileDirectory.readEntries(CompoundFileDirectory.java:139)
        at org.apache.lucene.store.CompoundFileDirectory.<init>(CompoundFileDirectory.java:105)
        at org.apache.lucene.index.SegmentReader.readFieldInfos(SegmentReader.java:280)
        at org.apache.lucene.index.IndexWriter.getFieldNumberMap(IndexWriter.java:835)
        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:787)
        at org.elasticsearch.index.engine.internal.InternalEngine.createWriter(InternalEngine.java:1407)
        at org.elasticsearch.index.engine.internal.InternalEngine.start(InternalEngine.java:271)
        ... 6 more
[2019-05-27 11:15:58,372][WARN ][cluster.action.shard     ] [Node4] [.marvel-2019.05.20][0] sending failed shard for [.marvel-2019.05.20][0], node[UXetSPnwQIuchTbBlonOrA], [P], s[INITIALIZING], indexUUID [LAWodau7QsOPINKIAwFJDg], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[.marvel-2019.05.20][0] failed recovery]; nested: EngineCreationFailureException[[.marvel-2019.05.20][0] failed to create engine]; nested: EOFException[read past EOF: NIOFSIndexInput(path="/home/elastic/ELK/Node4/data/My-ELK/nodes/0/indices/.marvel-2019.05.20/0/index/_oap.cfs")]; ]]
^C

elk11 · May 27, 2019, 5:14pm

I want to delete them without lost the others primary shards.

Deleting replica shards has absolutely no impact on primary shards!

It's possible to delete only the replica unassigned shards?

Yes, to achieve this reduce the count of 'number_of_replicas' over the entire cluster or desired index/indices. To update number of replicas follow this: Update Indices Settings | Elasticsearch Guide [6.4] | Elastic

(change the version number in url to your elasticsearch version)

elk2 · May 28, 2019, 8:47am

Thanks elk11,

it's possible to delete the shards without reduce number_of_replicas of the entire cluster?
I have a lot of documents and reduce replicas and next increase them I think it causes a big overhead.

I found the problem. The unassigned replica shards are already in the indexes of some nodes. I think that when the cluster went up again, it recreated the replica shards into nodes. Now that all nodes are up it found more replicas than expected.

elk11 · May 28, 2019, 2:48pm

it's possible to delete the shards without reduce number_of_replicas of the entire cluster?

You can reduce the number of replicas for a particular index/indices too.

I have a lot of documents and reduce replicas and next increase them I think it causes a big overhead.

I don't exactly understand what are you trying to achieve with this.

Now that all nodes are up it found more replicas than expected.

I don't think it is ever possible. You cannot get more number of replicas than you configured!

elk2 · May 28, 2019, 4:08pm

Ok.

I thought that I had to delete the replicas of the entire cluster and after cluster crash I am worried that cluster don't supports the load to recreate all replicas and crash another time but if I can remove replicas of particular index I can do it without problem.

I will show you:

-bash-4.1$ curl -s localhost:9200/_cat/shards | grep logstash-2019.05.27
logstash-2019.05.27  0 r STARTED      1739419    1.2gb 192.168.0.4    Node4
logstash-2019.05.27  0 p STARTED      1741863    1.2gb 192.168.0.3 Node3
logstash-2019.05.27  3 p STARTED      1737897    1.2gb 192.168.0.1 Node1
logstash-2019.05.27  3 r UNASSIGNED
logstash-2019.05.27  1 r STARTED      1740639    1.2gb 192.168.0.4     Node4
logstash-2019.05.27  1 p STARTED      1743116    1.2gb 192.168.0.3     Node3
logstash-2019.05.27  2 p STARTED      1737028    1.2gb 192.168.0.1  Node1
logstash-2019.05.27  2 r UNASSIGNED

Now if search in Node4 I see the folders of 0,1,2 and 3 shards but Node4 has only shard 0 and 1.

[root@Node4 /]$  ls -l /home/elastic/ELK/Node4/data/My-ELK/nodes/0/indices/logstash-2019.05.27/
total 20
drwxr-xr-x 5 root root 4096 May 27 15:39 0
drwxr-xr-x 5 root root 4096 May 27 15:40 1
drwxr-xr-x 5 root root 4096 May 27 01:26 2
drwxr-xr-x 5 root root 4096 May 27 01:26 3
drwxr-xr-x 2 root root 4096 May 27 16:27 _state

I think is not normal because in other folder I don't have all the shard folders.

elk2 · May 29, 2019, 12:22pm

I solved most of unassigned replicas.
Now still remains some primary shards in initializing status. How can I solve it?

-bash-4.1$  curl -s localhost:9200/_cat/shards | grep INI
logstash-2019.05.20  3 p INITIALIZING                  192.168.0.3     Node3
logstash-2019.05.20  2 p INITIALIZING                  192.168.0.3     Node3
.marvel-2019.05.20    0 p INITIALIZING                  192.168.0.1 Node1


[2019-05-29 13:52:32,735][WARN ][cluster.action.shard     ] [Node1] [.marvel-2019.05.20][0] sending failed shard for [.marvel-2019.05.20][0], node[fsazd6S2RzSnkYCrPxFFlQ], [P], s[INITIALIZING], indexUUID [LAWodau7QsOPINKIAwFJDg], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[.marvel-2019.05.20][0] failed to recover shard]; nested: IllegalArgumentException[No type mapped for [0]]; ]]
[2019-05-29 13:52:36,576][WARN ][indices.cluster          ] [Node1] [.marvel-2019.05.20][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [.marvel-2019.05.20][0] failed to recover shard
        at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:269)
        at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.IllegalArgumentException: No type mapped for [0]
        at org.elasticsearch.index.translog.Translog$Operation$Type.fromId(Translog.java:224)
        at org.elasticsearch.index.translog.TranslogStreams.readTranslogOperation(TranslogStreams.java:34)
        at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:241)
        ... 4 more

elk11 · May 29, 2019, 1:31pm

Which version of elasticsearch are you on?

Christian_Dahlqvist · May 29, 2019, 1:38pm

Given that you are on Elasticsearch 1.3 you really, really should upgrade, at least to the latest 1.x release, but ideally further.

elk2 · May 29, 2019, 2:18pm

Thanks Christian, I know but we can't upgrade before 6 months for administrative problems.

Temporary, I would like to solve these issues without upgrade if a solution exists.

Christian_Dahlqvist · May 30, 2019, 3:09pm

I do not know as I have not used that version in many years.

system · June 27, 2019, 3:09pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shards UNASSIGNED even tho they exist on disk Elasticsearch	3	523	July 6, 2017
Why shard unassigned after cluster restart completely? Elasticsearch	1	384	May 28, 2020
Unassigned primary and replica shards Elasticsearch	6	2058	July 6, 2017
Unassigned shards Elasticsearch	3	442	July 6, 2017
Is there a way to delete unassigned replica and recover using primary shard for a index? Elasticsearch	3	3844	October 3, 2019

Delete Unassigned replica shards

Related topics