Damaged ES cluster after upgrade - serious problem - please help

Grzegorz_K · December 17, 2014, 2:23pm

Hello,

I have updated ElasticSearch from ver 0.90.3 to ver 1.3.4 ( OS - Debian
Wheezy, deb package version ).
This is a cluster configuration, with 3 nodes connected to unicast.
Update was done with ElasticSearch switched off.
Afters start new verion ElasticSearch cluster health is in 'yellow' state
(showed by head plugin)
( and red state - showed by curl / _cluster / health ).

3 indexes in cluster has 3 unnassigned shards.

Logs from all nodes are lot of informations of "corrupted indexes" or
"sending failed shard for"

Does update to ver 1.4.2 should fix the problem? (Due to lucene libraries
LUCENE-5975 )
Removing index and rereading it is a last thing to do.

ES state from first node:

curl -XGET 'http://127.0.0.1:9200/_cluster/health?pretty=true'
{
"cluster_name" : "searchcass",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 283,
"active_shards" : 576,
"relocating_shards" : 0,
"initializing_shards" : 3,
"unassigned_shards" : 3
}

How can I fix it? Please reply.

Regards

Grzesiek

ES log from node 1 (search01):
...
[2014-12-17 11:04:20,176][WARN ][cluster.action.shard ] [search01]
[201205][0] received shard failed for [201205][0],
node[OWUJ3lZbT5i00JKgrDFUcw], [P], s[INITIALIZING], indexUUID [na],
reason [master
[search01][HYtX23nPS7uU-DeY-zF6AA][search01][inet[/192.168.199.211:9300]]
marked shard as initializing, but shard is marked as failed, resend shard
failure]
[2014-12-17 11:04:20,253][WARN ][indices.cluster ] [search01]
[201301][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201301][0] failed to fetch index version after copying it over
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201301][0]
Corrupted index [corrupted_cFQBoZ-WTK2sW8mgUUv1vw] caused by:
CorruptIndexException[did not read all bytes from file: read 9650 vs size
9651 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201301/0/index/_5f9v_k.del")))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:338)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:04:20,279][WARN ][cluster.action.shard ] [search01]
[201304][4] received shard failed for [201304][4],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[201304][4] failed to fetch index
version after copying it over]; nested: CorruptIndexException[[201304][4]
Corrupted index [corrupted_7hrGiX_jTx2KLbQUIAiLpg] caused by:
CorruptIndexException[did not read all bytes from file: read 295641 vs size
295642 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201304/4/index/_294h_17.del")))]];
]]
[2014-12-17 11:04:20,305][WARN ][cluster.action.shard ] [search01]
[201304][4] received shard failed for [201304][4],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [master
[search01][HYtX23nPS7uU-DeY-zF6AA][search01][inet[/192.168.199.211:9300]]
marked shard as initializing, but shard is marked as failed, resend shard
failure]
[2014-12-17 11:04:20,329][WARN ][cluster.action.shard ] [search01]
[201301][0] sending failed shard for [201301][0],
node[HYtX23nPS7uU-DeY-zF6AA], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[201301][0] failed to fetch index
version after copying it over]; nested: CorruptIndexException[[201301][0]
Corrupted index [corrupted_cFQBoZ-WTK2sW8mgUUv1vw] caused by:
CorruptIndexException[did not read all bytes from file: read 9650 vs size
9651 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcassandra/nodes/0/indices/201301/0/index/_5f9v_k.del")))]];
]]
[2014-12-17 11:04:20,329][WARN ][cluster.action.shard ] [search01]
[201301][0] received shard failed for [201301][0],
node[HYtX23nPS7uU-DeY-zF6AA], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[201301][0] failed to fetch index
version after copying it over]; nested: CorruptIndexException[[201301][0]
Corrupted index [corrupted_cFQBoZ-WTK2sW8mgUUv1vw] caused by:
CorruptIndexException[did not read all bytes from file: read 9650 vs size
9651 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201301/0/index/_5f9v_k.del")))]];
]]
[2014-12-17 11:04:20,331][WARN ][cluster.action.shard ] [search01]
[201301][0] received shard failed for [201301][0],
node[HYtX23nPS7uU-DeY-zF6AA], [P], s[INITIALIZING], indexUUID [na],
reason [master
[search01][HYtX23nPS7uU-DeY-zF6AA][search01][inet[/192.168.199.211:9300]]
marked shard as initializing, but shard is marked as failed, resend shard
failure]
...

ES log from node 2 (search02):

[2014-12-17 11:10:11,971][WARN ][cluster.action.shard ] [search02]
[201301][0] sending failed shard for [201301][0],
node[OWUJ3lZbT5i00JKgrDFUcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[201301][0] failed to fetch index
version after copying it over]; nested: CorruptIndexException[[201301][0]
Corrupted index [corrupted_U1eBtw3YRYKcfuV9ZHPadw] caused by:
CorruptIndexException[did not read all bytes from file: read 9650 vs size
9651 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201301/0/index/_5f9v_k.del")))]];
]]
[2014-12-17 11:10:12,258][WARN ][indices.cluster ] [search02]
[201205][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201205][0] failed to fetch index version after copying it over
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201205][0]
Corrupted index [corrupted_xCs6wOMpR-G3pbQfUpn-Ww] caused by:
CorruptIndexException[did not read all bytes from file: read 205 vs size
206 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201205/0/index/_1ys_3.del")))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:338)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:10:12,278][WARN ][indices.cluster ] [search02]
[201304][4] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201304][4] failed to fetch index version after copying it over
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201304][4]
Corrupted index [corrupted_mfMa6wjdT1m6QZ6WUBHKrA] caused by:
CorruptIndexException[did not read all bytes from file: read 295641 vs size
295642 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201304/4/index/_294h_17.del")))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:338)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:10:12,282][WARN ][cluster.action.shard ] [search02]
[201205][0] sending failed shard for [201205][0],
node[OWUJ3lZbT5i00JKgrDFUcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[201205][0] failed to fetch index
version after copying it over]; nested: CorruptIndexException[[201205][0]
Corrupted index [corrupted_xCs6wOMpR-G3pbQfUpn-Ww] caused by:
CorruptIndexException[did not read all bytes from file: read 205 vs size
206 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201205/0/index/_1ys_3.del")))]];
]]
[2014-12-17 11:10:12,297][WARN ][cluster.action.shard ] [search02]
[201304][4] sending failed shard for [201304][4],
node[OWUJ3lZbT5i00JKgrDFUcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[201304][4] failed to fetch index
version after copying it over]; nested: CorruptIndexException[[201304][4]
Corrupted index [corrupted_mfMa6wjdT1m6QZ6WUBHKrA] caused by:
CorruptIndexException[did not read all bytes from file: read 295641 vs size
295642 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201304/4/index/_294h_17.del")))]];
]]

ES log from node 3 (search03):

2014-12-17 11:13:49,541][WARN ][cluster.action.shard ] [search03]
[201205][0] sending failed shard for [201205][0],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[201205][0] failed to fetch index
version after copying it over]; nested: CorruptIndexException[[201205][0]
Corrupted index [corrupted_weSqXhW_T9Wle8wEHhEnXw] caused by:
CorruptIndexException[did not read all bytes from file: read 205 vs size
206 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201205/0/index/_1ys_3.del")))]];
]]
[2014-12-17 11:13:49,581][WARN ][indices.cluster ] [search03]
[201304][4] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201304][4] failed to fetch index version after copying it over
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201304][4]
Corrupted index [corrupted_7hrGiX_jTx2KLbQUIAiLpg] caused by:
CorruptIndexException[did not read all bytes from file: read 295641 vs size
295642 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201304/4/index/_294h_17.del")))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:338)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:13:49,651][WARN ][cluster.action.shard ] [search03]
[201304][4] sending failed shard for [201304][4],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[201304][4] failed to fetch index
version after copying it over]; nested: CorruptIndexException[[201304][4]
Corrupted index [corrupted_7hrGiX_jTx2KLbQUIAiLpg] caused by:
CorruptIndexException[did not read all bytes from file: read 295641 vs size
295642 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201304/4/index/_294h_17.del")))]];
]]
[2014-12-17 11:13:49,747][WARN ][indices.cluster ] [search03]
[201205][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201205][0] failed to fetch index version after copying it over
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201205][0]
Corrupted index [corrupted_weSqXhW_T9Wle8wEHhEnXw] caused by:
CorruptIndexException[did not read all bytes from file: read 205 vs size
206 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201205/0/index/_1ys_3.del")))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:338)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:13:49,823][WARN ][cluster.action.shard ] [search03]
[201205][0] sending failed shard for [201205][0],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[201205][0] failed to fetch index
version after copying it over]; nested: CorruptIndexException[[201205][0]
Corrupted index [corrupted_weSqXhW_T9Wle8wEHhEnXw] caused by:
CorruptIndexException[did not read all bytes from file: read 205 vs size
206 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201205/0/index/_1ys_3.del")))]];
]]

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/746145b6-dd27-468c-af1e-50b4685b1a38%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Peter_Portante · December 17, 2014, 6:21pm

On Wednesday, December 17, 2014 9:23:28 AM UTC-5, Grzegorz K wrote:

Hello,

I have updated Elasticsearch from ver 0.90.3 to ver 1.3.4 ( OS - Debian
Wheezy, deb package version ).
This is a cluster configuration, with 3 nodes connected to unicast.
Update was done with Elasticsearch switched off.
Afters start new verion Elasticsearch cluster health is in 'yellow' state
(showed by head plugin)
( and red state - showed by curl / _cluster / health ).

3 indexes in cluster has 3 unnassigned shards.

Logs from all nodes are lot of informations of "corrupted indexes" or
"sending failed shard for"

Does update to ver 1.4.2 should fix the problem? (Due to lucene libraries
LUCENE-5975 )
Removing index and rereading it is a last thing to do.

ES state from first node:

curl -XGET 'http://127.0.0.1:9200/_cluster/health?pretty=true'
{
"cluster_name" : "searchcass",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 283,
"active_shards" : 576,
"relocating_shards" : 0,
"initializing_shards" : 3,
"unassigned_shards" : 3
}

How can I fix it? Please reply.

Regards

Grzesiek

ES log from node 1 (search01):
...
[2014-12-17 11:04:20,176][WARN ][cluster.action.shard ] [search01]
[201205][0] received shard failed for [201205][0],
node[OWUJ3lZbT5i00JKgrDFUcw], [P], s[INITIALIZING], indexUUID [na],
reason [master [search01][HYtX23nPS7uU-DeY-zF6AA][search01][inet[/
192.168.199.211:9300]] marked shard as initializing, but shard is marked
as failed, resend shard failure]
[2014-12-17 11:04:20,253][WARN ][indices.cluster ] [search01]
[201301][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201301][0] failed to fetch index version after copying it over
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201301][0]
Corrupted index [corrupted_cFQBoZ-WTK2sW8mgUUv1vw] caused by:
CorruptIndexException[did not read all bytes from file: read 9650 vs size
9651 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201301/0/index/_5f9v_k.del")))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:338)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:04:20,279][WARN ][cluster.action.shard ] [search01]
[201304][4] received shard failed for [201304][4],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[201304][4] failed to fetch index
version after copying it over]; nested: CorruptIndexException[[201304][4]
Corrupted index [corrupted_7hrGiX_jTx2KLbQUIAiLpg] caused by:
CorruptIndexException[did not read all bytes from file: read 295641 vs size
295642 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201304/4/index/_294h_17.del")))]];
]]
[2014-12-17 11:04:20,305][WARN ][cluster.action.shard ] [search01]
[201304][4] received shard failed for [201304][4],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [master [search01][HYtX23nPS7uU-DeY-zF6AA][search01][inet[/
192.168.199.211:9300]] marked shard as initializing, but shard is marked
as failed, resend shard failure]
[2014-12-17 11:04:20,329][WARN ][cluster.action.shard ] [search01]
[201301][0] sending failed shard for [201301][0],
node[HYtX23nPS7uU-DeY-zF6AA], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[201301][0] failed to fetch index
version after copying it over]; nested: CorruptIndexException[[201301][0]
Corrupted index [corrupted_cFQBoZ-WTK2sW8mgUUv1vw] caused by:
CorruptIndexException[did not read all bytes from file: read 9650 vs size
9651 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcassandra/nodes/0/indices/201301/0/index/_5f9v_k.del")))]];
]]
[2014-12-17 11:04:20,329][WARN ][cluster.action.shard ] [search01]
[201301][0] received shard failed for [201301][0],
node[HYtX23nPS7uU-DeY-zF6AA], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[201301][0] failed to fetch index
version after copying it over]; nested: CorruptIndexException[[201301][0]
Corrupted index [corrupted_cFQBoZ-WTK2sW8mgUUv1vw] caused by:
CorruptIndexException[did not read all bytes from file: read 9650 vs size
9651 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201301/0/index/_5f9v_k.del")))]];
]]
[2014-12-17 11:04:20,331][WARN ][cluster.action.shard ] [search01]
[201301][0] received shard failed for [201301][0],
node[HYtX23nPS7uU-DeY-zF6AA], [P], s[INITIALIZING], indexUUID [na],
reason [master [search01][HYtX23nPS7uU-DeY-zF6AA][search01][inet[/
192.168.199.211:9300]] marked shard as initializing, but shard is marked
as failed, resend shard failure]
...

ES log from node 2 (search02):

[2014-12-17 11:10:11,971][WARN ][cluster.action.shard ] [search02]
[201301][0] sending failed shard for [201301][0],
node[OWUJ3lZbT5i00JKgrDFUcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[201301][0] failed to fetch index
version after copying it over]; nested: CorruptIndexException[[201301][0]
Corrupted index [corrupted_U1eBtw3YRYKcfuV9ZHPadw] caused by:
CorruptIndexException[did not read all bytes from file: read 9650 vs size
9651 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201301/0/index/_5f9v_k.del")))]];
]]
[2014-12-17 11:10:12,258][WARN ][indices.cluster ] [search02]
[201205][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201205][0] failed to fetch index version after copying it over
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201205][0]
Corrupted index [corrupted_xCs6wOMpR-G3pbQfUpn-Ww] caused by:
CorruptIndexException[did not read all bytes from file: read 205 vs size
206 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201205/0/index/_1ys_3.del")))]

Off by one byte on all these, it seems, perhaps a clue?

at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:338)
at 
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:10:12,278][WARN ][indices.cluster ] [search02]
[201304][4] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201304][4] failed to fetch index version after copying it over
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201304][4]
Corrupted index [corrupted_mfMa6wjdT1m6QZ6WUBHKrA] caused by:
CorruptIndexException[did not read all bytes from file: read 295641 vs size
295642 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201304/4/index/_294h_17.del")))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:338)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:10:12,282][WARN ][cluster.action.shard ] [search02]
[201205][0] sending failed shard for [201205][0],
node[OWUJ3lZbT5i00JKgrDFUcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[201205][0] failed to fetch index
version after copying it over]; nested: CorruptIndexException[[201205][0]
Corrupted index [corrupted_xCs6wOMpR-G3pbQfUpn-Ww] caused by:
CorruptIndexException[did not read all bytes from file: read 205 vs size
206 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201205/0/index/_1ys_3.del")))]];
]]
[2014-12-17 11:10:12,297][WARN ][cluster.action.shard ] [search02]
[201304][4] sending failed shard for [201304][4],
node[OWUJ3lZbT5i00JKgrDFUcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[201304][4] failed to fetch index
version after copying it over]; nested: CorruptIndexException[[201304][4]
Corrupted index [corrupted_mfMa6wjdT1m6QZ6WUBHKrA] caused by:
CorruptIndexException[did not read all bytes from file: read 295641 vs size
295642 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201304/4/index/_294h_17.del")))]];
]]

ES log from node 3 (search03):

2014-12-17 11:13:49,541][WARN ][cluster.action.shard ] [search03]
[201205][0] sending failed shard for [201205][0],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[201205][0] failed to fetch index
version after copying it over]; nested: CorruptIndexException[[201205][0]
Corrupted index [corrupted_weSqXhW_T9Wle8wEHhEnXw] caused by:
CorruptIndexException[did not read all bytes from file: read 205 vs size
206 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201205/0/index/_1ys_3.del")))]];
]]
[2014-12-17 11:13:49,581][WARN ][indices.cluster ] [search03]
[201304][4] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201304][4] failed to fetch index version after copying it over
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201304][4]
Corrupted index [corrupted_7hrGiX_jTx2KLbQUIAiLpg] caused by:
CorruptIndexException[did not read all bytes from file: read 295641 vs size
295642 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201304/4/index/_294h_17.del")))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:338)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:13:49,651][WARN ][cluster.action.shard ] [search03]
[201304][4] sending failed shard for [201304][4],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[201304][4] failed to fetch index
version after copying it over]; nested: CorruptIndexException[[201304][4]
Corrupted index [corrupted_7hrGiX_jTx2KLbQUIAiLpg] caused by:
CorruptIndexException[did not read all bytes from file: read 295641 vs size
295642 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201304/4/index/_294h_17.del")))]];
]]
[2014-12-17 11:13:49,747][WARN ][indices.cluster ] [search03]
[201205][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201205][0] failed to fetch index version after copying it over
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201205][0]
Corrupted index [corrupted_weSqXhW_T9Wle8wEHhEnXw] caused by:
CorruptIndexException[did not read all bytes from file: read 205 vs size
206 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201205/0/index/_1ys_3.del")))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:338)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:13:49,823][WARN ][cluster.action.shard ] [search03]
[201205][0] sending failed shard for [201205][0],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[201205][0] failed to fetch index
version after copying it over]; nested: CorruptIndexException[[201205][0]
Corrupted index [corrupted_weSqXhW_T9Wle8wEHhEnXw] caused by:
CorruptIndexException[did not read all bytes from file: read 205 vs size
206 (resource:
BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/nodes/0/indices/201205/0/index/_1ys_3.del")))]];
]]

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b5ee081f-d5bf-4f90-9cdb-5d22dcd27b99%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · December 17, 2014, 6:44pm

Did you take a backup?
Did you go from 0.90.0 to 1.3.4 directly?

On 17 December 2014 at 19:21, Peter Portante peter.a.portante@gmail.com
wrote:

On Wednesday, December 17, 2014 9:23:28 AM UTC-5, Grzegorz K wrote:

Hello,

I have updated Elasticsearch from ver 0.90.3 to ver 1.3.4 ( OS - Debian
Wheezy, deb package version ).
This is a cluster configuration, with 3 nodes connected to unicast.
Update was done with Elasticsearch switched off.
Afters start new verion Elasticsearch cluster health is in 'yellow' state
(showed by head plugin)
( and red state - showed by curl / _cluster / health ).

3 indexes in cluster has 3 unnassigned shards.

Logs from all nodes are lot of informations of "corrupted indexes" or
"sending failed shard for"

Does update to ver 1.4.2 should fix the problem? (Due to lucene libraries
LUCENE-5975 )
Removing index and rereading it is a last thing to do.

ES state from first node:

curl -XGET 'http://127.0.0.1:9200/_cluster/health?pretty=true'
{
"cluster_name" : "searchcass",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 283,
"active_shards" : 576,
"relocating_shards" : 0,
"initializing_shards" : 3,
"unassigned_shards" : 3
}

How can I fix it? Please reply.

Regards

Grzesiek

ES log from node 1 (search01):
...
[2014-12-17 11:04:20,176][WARN ][cluster.action.shard ] [search01]
[201205][0] received shard failed for [201205][0],
node[OWUJ3lZbT5i00JKgrDFUcw], [P], s[INITIALIZING], indexUUID [na],
reason [master [search01][HYtX23nPS7uU-DeY-zF6AA][search01][inet[/192.
168.199.211:9300]] marked shard as initializing, but shard is marked as
failed, resend shard failure]
[2014-12-17 11:04:20,253][WARN ][indices.cluster ] [search01]
[201301][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201301][0] failed to fetch index version after copying it over
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:152)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.
run(IndexShardGatewayService.java:132)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201301][0]
Corrupted index [corrupted_cFQBoZ-WTK2sW8mgUUv1vw] caused by:
CorruptIndexException[did not read all bytes from file: read 9650 vs size
9651 (resource: BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/
lib/elasticsearch/searchcass/nodes/0/indices/201301/0/
index/_5f9v_k.del")))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:338)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:04:20,279][WARN ][cluster.action.shard ] [search01]
[201304][4] received shard failed for [201304][4],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[201304][4]
failed to fetch index version after copying it over]; nested:
CorruptIndexException[[201304][4] Corrupted index [corrupted_7hrGiX_jTx2KLbQUIAiLpg]
caused by: CorruptIndexException[did not read all bytes from file: read
295641 vs size 295642 (resource: BufferedChecksumIndexInput(
NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/
nodes/0/indices/201304/4/index/_294h_17.del")))]]; ]]
[2014-12-17 11:04:20,305][WARN ][cluster.action.shard ] [search01]
[201304][4] received shard failed for [201304][4],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [master [search01][HYtX23nPS7uU-DeY-zF6AA][search01][inet[/192.
168.199.211:9300]] marked shard as initializing, but shard is marked as
failed, resend shard failure]
[2014-12-17 11:04:20,329][WARN ][cluster.action.shard ] [search01]
[201301][0] sending failed shard for [201301][0],
node[HYtX23nPS7uU-DeY-zF6AA], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[201301][0]
failed to fetch index version after copying it over]; nested:
CorruptIndexException[[201301][0] Corrupted index [corrupted_cFQBoZ-WTK2sW8mgUUv1vw]
caused by: CorruptIndexException[did not read all bytes from file: read
9650 vs size 9651 (resource: BufferedChecksumIndexInput(
NIOFSIndexInput(path="/var/lib/elasticsearch/searchcassandra/nodes/0/
indices/201301/0/index/_5f9v_k.del")))]]; ]]
[2014-12-17 11:04:20,329][WARN ][cluster.action.shard ] [search01]
[201301][0] received shard failed for [201301][0],
node[HYtX23nPS7uU-DeY-zF6AA], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[201301][0]
failed to fetch index version after copying it over]; nested:
CorruptIndexException[[201301][0] Corrupted index [corrupted_cFQBoZ-WTK2sW8mgUUv1vw]
caused by: CorruptIndexException[did not read all bytes from file: read
9650 vs size 9651 (resource: BufferedChecksumIndexInput(
NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/
nodes/0/indices/201301/0/index/_5f9v_k.del")))]]; ]]
[2014-12-17 11:04:20,331][WARN ][cluster.action.shard ] [search01]
[201301][0] received shard failed for [201301][0],
node[HYtX23nPS7uU-DeY-zF6AA], [P], s[INITIALIZING], indexUUID [na],
reason [master [search01][HYtX23nPS7uU-DeY-zF6AA][search01][inet[/192.
168.199.211:9300]] marked shard as initializing, but shard is marked as
failed, resend shard failure]
...

ES log from node 2 (search02):

[2014-12-17 11:10:11,971][WARN ][cluster.action.shard ] [search02]
[201301][0] sending failed shard for [201301][0],
node[OWUJ3lZbT5i00JKgrDFUcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[201301][0]
failed to fetch index version after copying it over]; nested:
CorruptIndexException[[201301][0] Corrupted index [corrupted_U1eBtw3YRYKcfuV9ZHPadw]
caused by: CorruptIndexException[did not read all bytes from file: read
9650 vs size 9651 (resource: BufferedChecksumIndexInput(
NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/
nodes/0/indices/201301/0/index/_5f9v_k.del")))]]; ]]
[2014-12-17 11:10:12,258][WARN ][indices.cluster ] [search02]
[201205][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201205][0] failed to fetch index version after copying it over
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:152)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.
run(IndexShardGatewayService.java:132)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201205][0]
Corrupted index [corrupted_xCs6wOMpR-G3pbQfUpn-Ww] caused by:
CorruptIndexException[did not read all bytes from file: read 205 vs size
206 (resource: BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/
lib/elasticsearch/searchcass/nodes/0/indices/201205/0/
index/_1ys_3.del")))]

Off by one byte on all these, it seems, perhaps a clue?
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:338)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:10:12,278][WARN ][indices.cluster ] [search02]
[201304][4] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201304][4] failed to fetch index version after copying it over
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:152)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.
run(IndexShardGatewayService.java:132)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201304][4]
Corrupted index [corrupted_mfMa6wjdT1m6QZ6WUBHKrA] caused by:
CorruptIndexException[did not read all bytes from file: read 295641 vs size
295642 (resource: BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/
lib/elasticsearch/searchcass/nodes/0/indices/201304/4/
index/_294h_17.del")))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:338)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:10:12,282][WARN ][cluster.action.shard ] [search02]
[201205][0] sending failed shard for [201205][0],
node[OWUJ3lZbT5i00JKgrDFUcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[201205][0]
failed to fetch index version after copying it over]; nested:
CorruptIndexException[[201205][0] Corrupted index [corrupted_xCs6wOMpR-G3pbQfUpn-Ww]
caused by: CorruptIndexException[did not read all bytes from file: read 205
vs size 206 (resource: BufferedChecksumIndexInput(
NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/
nodes/0/indices/201205/0/index/_1ys_3.del")))]]; ]]
[2014-12-17 11:10:12,297][WARN ][cluster.action.shard ] [search02]
[201304][4] sending failed shard for [201304][4],
node[OWUJ3lZbT5i00JKgrDFUcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[201304][4]
failed to fetch index version after copying it over]; nested:
CorruptIndexException[[201304][4] Corrupted index [corrupted_mfMa6wjdT1m6QZ6WUBHKrA]
caused by: CorruptIndexException[did not read all bytes from file: read
295641 vs size 295642 (resource: BufferedChecksumIndexInput(
NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/
nodes/0/indices/201304/4/index/_294h_17.del")))]]; ]]

ES log from node 3 (search03):

2014-12-17 11:13:49,541][WARN ][cluster.action.shard ] [search03]
[201205][0] sending failed shard for [201205][0],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[201205][0]
failed to fetch index version after copying it over]; nested:
CorruptIndexException[[201205][0] Corrupted index [corrupted_weSqXhW_T9Wle8wEHhEnXw]
caused by: CorruptIndexException[did not read all bytes from file: read 205
vs size 206 (resource: BufferedChecksumIndexInput(
NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/
nodes/0/indices/201205/0/index/_1ys_3.del")))]]; ]]
[2014-12-17 11:13:49,581][WARN ][indices.cluster ] [search03]
[201304][4] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201304][4] failed to fetch index version after copying it over
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:152)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.
run(IndexShardGatewayService.java:132)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201304][4]
Corrupted index [corrupted_7hrGiX_jTx2KLbQUIAiLpg] caused by:
CorruptIndexException[did not read all bytes from file: read 295641 vs size
295642 (resource: BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/
lib/elasticsearch/searchcass/nodes/0/indices/201304/4/
index/_294h_17.del")))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:338)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:13:49,651][WARN ][cluster.action.shard ] [search03]
[201304][4] sending failed shard for [201304][4],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[201304][4]
failed to fetch index version after copying it over]; nested:
CorruptIndexException[[201304][4] Corrupted index [corrupted_7hrGiX_jTx2KLbQUIAiLpg]
caused by: CorruptIndexException[did not read all bytes from file: read
295641 vs size 295642 (resource: BufferedChecksumIndexInput(
NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/
nodes/0/indices/201304/4/index/_294h_17.del")))]]; ]]
[2014-12-17 11:13:49,747][WARN ][indices.cluster ] [search03]
[201205][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201205][0] failed to fetch index version after copying it over
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:152)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.
run(IndexShardGatewayService.java:132)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201205][0]
Corrupted index [corrupted_weSqXhW_T9Wle8wEHhEnXw] caused by:
CorruptIndexException[did not read all bytes from file: read 205 vs size
206 (resource: BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/
lib/elasticsearch/searchcass/nodes/0/indices/201205/0/
index/_1ys_3.del")))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:338)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:13:49,823][WARN ][cluster.action.shard ] [search03]
[201205][0] sending failed shard for [201205][0],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[201205][0]
failed to fetch index version after copying it over]; nested:
CorruptIndexException[[201205][0] Corrupted index [corrupted_weSqXhW_T9Wle8wEHhEnXw]
caused by: CorruptIndexException[did not read all bytes from file: read 205
vs size 206 (resource: BufferedChecksumIndexInput(
NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/
nodes/0/indices/201205/0/index/_1ys_3.del")))]]; ]]

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b5ee081f-d5bf-4f90-9cdb-5d22dcd27b99%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b5ee081f-d5bf-4f90-9cdb-5d22dcd27b99%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-g_SxFa0RM8vwnmS5bX8%3DRWYp67B3Kxu%2BQYEqvhcxi7w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Grzegorz_K · December 17, 2014, 8:07pm

On Wednesday, December 17, 2014 7:44:54 PM UTC+1, Mark Walkom wrote:

Did you take a backup?

Yes I have a backup data catalog

Did you go from 0.90.0 to 1.3.4 directly?

Yes, upgrade was go from 0.90.3 to 1.3.4 directly

On 17 December 2014 at 19:21, Peter Portante <peter.a....@gmail.com
<javascript:>> wrote:
On Wednesday, December 17, 2014 9:23:28 AM UTC-5, Grzegorz K wrote:

Hello,

I have updated Elasticsearch from ver 0.90.3 to ver 1.3.4 ( OS - Debian
Wheezy, deb package version ).
This is a cluster configuration, with 3 nodes connected to unicast.
Update was done with Elasticsearch switched off.
Afters start new verion Elasticsearch cluster health is in 'yellow'
state (showed by head plugin)
( and red state - showed by curl / _cluster / health ).

3 indexes in cluster has 3 unnassigned shards.

Logs from all nodes are lot of informations of "corrupted indexes" or
"sending failed shard for"

Does update to ver 1.4.2 should fix the problem? (Due to lucene
libraries LUCENE-5975 )
Removing index and rereading it is a last thing to do.

ES state from first node:

curl -XGET 'http://127.0.0.1:9200/_cluster/health?pretty=true'
{
"cluster_name" : "searchcass",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 283,
"active_shards" : 576,
"relocating_shards" : 0,
"initializing_shards" : 3,
"unassigned_shards" : 3
}

How can I fix it? Please reply.

Regards

Grzesiek

ES log from node 1 (search01):
...
[2014-12-17 11:04:20,176][WARN ][cluster.action.shard ] [search01]
[201205][0] received shard failed for [201205][0],
node[OWUJ3lZbT5i00JKgrDFUcw], [P], s[INITIALIZING], indexUUID [na],
reason [master [search01][HYtX23nPS7uU-DeY-zF6AA][search01][inet[/192.
168.199.211:9300]] marked shard as initializing, but shard is marked as
failed, resend shard failure]
[2014-12-17 11:04:20,253][WARN ][indices.cluster ] [search01]
[201301][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201301][0] failed to fetch index version after copying it over
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:152)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.
run(IndexShardGatewayService.java:132)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201301][0]
Corrupted index [corrupted_cFQBoZ-WTK2sW8mgUUv1vw] caused by:
CorruptIndexException[did not read all bytes from file: read 9650 vs size
9651 (resource: BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/
lib/elasticsearch/searchcass/nodes/0/indices/201301/0/
index/_5f9v_k.del")))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:338)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:04:20,279][WARN ][cluster.action.shard ] [search01]
[201304][4] received shard failed for [201304][4],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[201304][4]
failed to fetch index version after copying it over]; nested:
CorruptIndexException[[201304][4] Corrupted index [corrupted_7hrGiX_jTx2KLbQUIAiLpg]
caused by: CorruptIndexException[did not read all bytes from file: read
295641 vs size 295642 (resource: BufferedChecksumIndexInput(
NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/
nodes/0/indices/201304/4/index/_294h_17.del")))]]; ]]
[2014-12-17 11:04:20,305][WARN ][cluster.action.shard ] [search01]
[201304][4] received shard failed for [201304][4],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [master [search01][HYtX23nPS7uU-DeY-zF6AA][search01][inet[/192.
168.199.211:9300]] marked shard as initializing, but shard is marked as
failed, resend shard failure]
[2014-12-17 11:04:20,329][WARN ][cluster.action.shard ] [search01]
[201301][0] sending failed shard for [201301][0],
node[HYtX23nPS7uU-DeY-zF6AA], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[201301][0]
failed to fetch index version after copying it over]; nested:
CorruptIndexException[[201301][0] Corrupted index [corrupted_cFQBoZ-WTK2sW8mgUUv1vw]
caused by: CorruptIndexException[did not read all bytes from file: read
9650 vs size 9651 (resource: BufferedChecksumIndexInput(
NIOFSIndexInput(path="/var/lib/elasticsearch/searchcassandra/nodes/0/
indices/201301/0/index/_5f9v_k.del")))]]; ]]
[2014-12-17 11:04:20,329][WARN ][cluster.action.shard ] [search01]
[201301][0] received shard failed for [201301][0],
node[HYtX23nPS7uU-DeY-zF6AA], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[201301][0]
failed to fetch index version after copying it over]; nested:
CorruptIndexException[[201301][0] Corrupted index [corrupted_cFQBoZ-WTK2sW8mgUUv1vw]
caused by: CorruptIndexException[did not read all bytes from file: read
9650 vs size 9651 (resource: BufferedChecksumIndexInput(
NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/
nodes/0/indices/201301/0/index/_5f9v_k.del")))]]; ]]
[2014-12-17 11:04:20,331][WARN ][cluster.action.shard ] [search01]
[201301][0] received shard failed for [201301][0],
node[HYtX23nPS7uU-DeY-zF6AA], [P], s[INITIALIZING], indexUUID [na],
reason [master [search01][HYtX23nPS7uU-DeY-zF6AA][search01][inet[/192.
168.199.211:9300]] marked shard as initializing, but shard is marked as
failed, resend shard failure]
...

ES log from node 2 (search02):

[2014-12-17 11:10:11,971][WARN ][cluster.action.shard ] [search02]
[201301][0] sending failed shard for [201301][0],
node[OWUJ3lZbT5i00JKgrDFUcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[201301][0]
failed to fetch index version after copying it over]; nested:
CorruptIndexException[[201301][0] Corrupted index [corrupted_U1eBtw3YRYKcfuV9ZHPadw]
caused by: CorruptIndexException[did not read all bytes from file: read
9650 vs size 9651 (resource: BufferedChecksumIndexInput(
NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/
nodes/0/indices/201301/0/index/_5f9v_k.del")))]]; ]]
[2014-12-17 11:10:12,258][WARN ][indices.cluster ] [search02]
[201205][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201205][0] failed to fetch index version after copying it over
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:152)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.
run(IndexShardGatewayService.java:132)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201205][0]
Corrupted index [corrupted_xCs6wOMpR-G3pbQfUpn-Ww] caused by:
CorruptIndexException[did not read all bytes from file: read 205 vs size
206 (resource: BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/
lib/elasticsearch/searchcass/nodes/0/indices/201205/0/
index/_1ys_3.del")))]

Off by one byte on all these, it seems, perhaps a clue?
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:338)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:10:12,278][WARN ][indices.cluster ] [search02]
[201304][4] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201304][4] failed to fetch index version after copying it over
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:152)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.
run(IndexShardGatewayService.java:132)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201304][4]
Corrupted index [corrupted_mfMa6wjdT1m6QZ6WUBHKrA] caused by:
CorruptIndexException[did not read all bytes from file: read 295641 vs size
295642 (resource: BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/
lib/elasticsearch/searchcass/nodes/0/indices/201304/4/
index/_294h_17.del")))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:338)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:10:12,282][WARN ][cluster.action.shard ] [search02]
[201205][0] sending failed shard for [201205][0],
node[OWUJ3lZbT5i00JKgrDFUcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[201205][0]
failed to fetch index version after copying it over]; nested:
CorruptIndexException[[201205][0] Corrupted index [corrupted_xCs6wOMpR-G3pbQfUpn-Ww]
caused by: CorruptIndexException[did not read all bytes from file: read 205
vs size 206 (resource: BufferedChecksumIndexInput(
NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/
nodes/0/indices/201205/0/index/_1ys_3.del")))]]; ]]
[2014-12-17 11:10:12,297][WARN ][cluster.action.shard ] [search02]
[201304][4] sending failed shard for [201304][4],
node[OWUJ3lZbT5i00JKgrDFUcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[201304][4]
failed to fetch index version after copying it over]; nested:
CorruptIndexException[[201304][4] Corrupted index [corrupted_mfMa6wjdT1m6QZ6WUBHKrA]
caused by: CorruptIndexException[did not read all bytes from file: read
295641 vs size 295642 (resource: BufferedChecksumIndexInput(
NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/
nodes/0/indices/201304/4/index/_294h_17.del")))]]; ]]

ES log from node 3 (search03):

2014-12-17 11:13:49,541][WARN ][cluster.action.shard ] [search03]
[201205][0] sending failed shard for [201205][0],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[201205][0]
failed to fetch index version after copying it over]; nested:
CorruptIndexException[[201205][0] Corrupted index [corrupted_weSqXhW_T9Wle8wEHhEnXw]
caused by: CorruptIndexException[did not read all bytes from file: read 205
vs size 206 (resource: BufferedChecksumIndexInput(
NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/
nodes/0/indices/201205/0/index/_1ys_3.del")))]]; ]]
[2014-12-17 11:13:49,581][WARN ][indices.cluster ] [search03]
[201304][4] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201304][4] failed to fetch index version after copying it over
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:152)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.
run(IndexShardGatewayService.java:132)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201304][4]
Corrupted index [corrupted_7hrGiX_jTx2KLbQUIAiLpg] caused by:
CorruptIndexException[did not read all bytes from file: read 295641 vs size
295642 (resource: BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/
lib/elasticsearch/searchcass/nodes/0/indices/201304/4/
index/_294h_17.del")))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:338)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:13:49,651][WARN ][cluster.action.shard ] [search03]
[201304][4] sending failed shard for [201304][4],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[201304][4]
failed to fetch index version after copying it over]; nested:
CorruptIndexException[[201304][4] Corrupted index [corrupted_7hrGiX_jTx2KLbQUIAiLpg]
caused by: CorruptIndexException[did not read all bytes from file: read
295641 vs size 295642 (resource: BufferedChecksumIndexInput(
NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/
nodes/0/indices/201304/4/index/_294h_17.del")))]]; ]]
[2014-12-17 11:13:49,747][WARN ][indices.cluster ] [search03]
[201205][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[201205][0] failed to fetch index version after copying it over
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:152)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.
run(IndexShardGatewayService.java:132)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.lucene.index.CorruptIndexException: [201205][0]
Corrupted index [corrupted_weSqXhW_T9Wle8wEHhEnXw] caused by:
CorruptIndexException[did not read all bytes from file: read 205 vs size
206 (resource: BufferedChecksumIndexInput(NIOFSIndexInput(path="/var/
lib/elasticsearch/searchcass/nodes/0/indices/201205/0/
index/_1ys_3.del")))]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:353)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.
java:338)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
recover(LocalIndexShardGateway.java:119)
... 4 more
[2014-12-17 11:13:49,823][WARN ][cluster.action.shard ] [search03]
[201205][0] sending failed shard for [201205][0],
node[zygoKW7SR6CwvanVoNrPcw], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[201205][0]
failed to fetch index version after copying it over]; nested:
CorruptIndexException[[201205][0] Corrupted index [corrupted_weSqXhW_T9Wle8wEHhEnXw]
caused by: CorruptIndexException[did not read all bytes from file: read 205
vs size 206 (resource: BufferedChecksumIndexInput(
NIOFSIndexInput(path="/var/lib/elasticsearch/searchcass/
nodes/0/indices/201205/0/index/_1ys_3.del")))]]; ]]

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b5ee081f-d5bf-4f90-9cdb-5d22dcd27b99%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b5ee081f-d5bf-4f90-9cdb-5d22dcd27b99%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b57dd458-72e9-44f1-87af-6265f0deca5d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Shard recovery Elasticsearch	2	359	July 6, 2017
After upgrade from ES 1.0.2 to ES 2.3.4, cluster state is red Elasticsearch	3	440	July 5, 2017
ES Cluster Recovery and Restart Elasticsearch	3	586	July 6, 2017
ES has a lot of unassigned_shards, failed to process cluster event, stuck in 503 error Elasticsearch	4	846	July 5, 2017
ES Cluster in Yellow with Unassigned Shards flopping in and out of Unassigned Elasticsearch	4	1331	January 19, 2017

Damaged ES cluster after upgrade - serious problem - please help

Related topics