Failing Replica Shards

David_Kleiner · August 26, 2014, 8:08pm

Hello,

In the past couple of days I've been getting a lot of error messages about
corrupted replica shards. The primary shards come up fast after ES process
restart but replicas take a long time to come back. Sometimes it takes a
few node restarts to 'kick' the nodes to start replica shards.

ES version is 1.3.1 running on CentOS 6.5 hosted at Softlayer. It's a
3-way cluster with 4 logstash feeders hanging off it.

Here are the errors;

[2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 /
Salvador Dali] [downloader-2014.08][4] received shard failed for
[downloader-2014.08][4], node[l9-BQTHSSF-ElhgpPBZ24w], [R],
s[INITIALIZING], indexUUID [2vRrb5YlQP6MTVr1chOezg], reason [engine
failure, message [corrupted preexisting
index][CorruptIndexException[[downloader-2014.08][4] Corrupted index
[corrupted_SkU0-ZHZRxivSnGczABb_g] caused by: CorruptIndexException[codec
footer mismatch: actual footer=-1676705023 vs expected footer=-1071082520
(resource:
NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/downloader-2014.08/4/index/_k9a_es090_0.doc"))]]]]
[2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 /
Salvador Dali] [eventlog-2014.06][0] received shard failed for
[eventlog-2014.06][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING],
indexUUID [jbvChdRrRB6HTutxPvxMmQ], reason [engine failure, message
[corrupted preexisting index][CorruptIndexException[[eventlog-2014.06][0]
Corrupted index [corrupted__712QIBQQqafzpBoQwZtcg] caused by:
CorruptIndexException[codec footer mismatch: actual footer=0 vs expected
footer=-1071082520 (resource:
NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.06/0/index/_1k4x.nvd"))]]]]
[2014-08-26 15:01:18,684][WARN ][cluster.action.shard ] [log03 /
Salvador Dali] [eventlog-2014.07][0] received shard failed for
[eventlog-2014.07][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING],
indexUUID [T4tTXkPjTaCdSVNTjHfOcg], reason [engine failure, message
[corrupted preexisting index][CorruptIndexException[[eventlog-2014.07][0]
Corrupted index [corrupted_OzfNRRGyTIq8a1PRhLYG2w] caused by:
CorruptIndexException[codec footer mismatch: actual footer=0 vs expected
footer=-1071082520 (resource:
NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.07/0/index/_rqf.nvd"))]]]]

Thanks,

David

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c0af53fb-6fdd-4624-bf6c-9b9d50081689%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mehmet_Cem_Gunturkun · November 29, 2014, 10:48am

Hey David, I have same problem now. Have you found a solution for that
problem?

26 Ağustos 2014 Salı 23:08:55 UTC+3 tarihinde David Kleiner yazdı:

Hello,

In the past couple of days I've been getting a lot of error messages about
corrupted replica shards. The primary shards come up fast after ES process
restart but replicas take a long time to come back. Sometimes it takes a
few node restarts to 'kick' the nodes to start replica shards.

ES version is 1.3.1 running on CentOS 6.5 hosted at Softlayer. It's a
3-way cluster with 4 logstash feeders hanging off it.

Here are the errors;

[2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 /
Salvador Dali] [downloader-2014.08][4] received shard failed for
[downloader-2014.08][4], node[l9-BQTHSSF-ElhgpPBZ24w], [R],
s[INITIALIZING], indexUUID [2vRrb5YlQP6MTVr1chOezg], reason [engine
failure, message [corrupted preexisting
index][CorruptIndexException[[downloader-2014.08][4] Corrupted index
[corrupted_SkU0-ZHZRxivSnGczABb_g] caused by: CorruptIndexException[codec
footer mismatch: actual footer=-1676705023 vs expected footer=-1071082520
(resource:
NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/downloader-2014.08/4/index/_k9a_es090_0.doc"))]]]]
[2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 /
Salvador Dali] [eventlog-2014.06][0] received shard failed for
[eventlog-2014.06][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING],
indexUUID [jbvChdRrRB6HTutxPvxMmQ], reason [engine failure, message
[corrupted preexisting index][CorruptIndexException[[eventlog-2014.06][0]
Corrupted index [corrupted__712QIBQQqafzpBoQwZtcg] caused by:
CorruptIndexException[codec footer mismatch: actual footer=0 vs expected
footer=-1071082520 (resource:
NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.06/0/index/_1k4x.nvd"))]]]]
[2014-08-26 15:01:18,684][WARN ][cluster.action.shard ] [log03 /
Salvador Dali] [eventlog-2014.07][0] received shard failed for
[eventlog-2014.07][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING],
indexUUID [T4tTXkPjTaCdSVNTjHfOcg], reason [engine failure, message
[corrupted preexisting index][CorruptIndexException[[eventlog-2014.07][0]
Corrupted index [corrupted_OzfNRRGyTIq8a1PRhLYG2w] caused by:
CorruptIndexException[codec footer mismatch: actual footer=0 vs expected
footer=-1071082520 (resource:
NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.07/0/index/_rqf.nvd"))]]]]

Thanks,

David

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/04a6e42a-0518-47ef-81a2-b59856a8a309%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David_Kleiner · November 29, 2014, 11:57pm

Hello Mehmet,

For two indices with problematic shards (symptoms: shard is recovering,
recovery stops and recovery is attempted on a different node), I manually
changed replica count to 1 then 2. With a big index (over 90G, I think), I
was never able to recover dual replica set, thankfully it was OK to drop
it. Upgrading to more recent ES version helped too.

HTH,

David

On Saturday, November 29, 2014 2:48:45 AM UTC-8, Mehmet Cem Güntürkün wrote:

Hey David, I have same problem now. Have you found a solution for that
problem?

26 Ağustos 2014 Salı 23:08:55 UTC+3 tarihinde David Kleiner yazdı:

Hello,

In the past couple of days I've been getting a lot of error messages
about corrupted replica shards. The primary shards come up fast after ES
process restart but replicas take a long time to come back. Sometimes it
takes a few node restarts to 'kick' the nodes to start replica shards.

ES version is 1.3.1 running on CentOS 6.5 hosted at Softlayer. It's a
3-way cluster with 4 logstash feeders hanging off it.

Here are the errors;

[2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 /
Salvador Dali] [downloader-2014.08][4] received shard failed for
[downloader-2014.08][4], node[l9-BQTHSSF-ElhgpPBZ24w], [R],
s[INITIALIZING], indexUUID [2vRrb5YlQP6MTVr1chOezg], reason [engine
failure, message [corrupted preexisting
index][CorruptIndexException[[downloader-2014.08][4] Corrupted index
[corrupted_SkU0-ZHZRxivSnGczABb_g] caused by: CorruptIndexException[codec
footer mismatch: actual footer=-1676705023 vs expected footer=-1071082520
(resource:
NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/downloader-2014.08/4/index/_k9a_es090_0.doc"))]]]]
[2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 /
Salvador Dali] [eventlog-2014.06][0] received shard failed for
[eventlog-2014.06][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING],
indexUUID [jbvChdRrRB6HTutxPvxMmQ], reason [engine failure, message
[corrupted preexisting index][CorruptIndexException[[eventlog-2014.06][0]
Corrupted index [corrupted__712QIBQQqafzpBoQwZtcg] caused by:
CorruptIndexException[codec footer mismatch: actual footer=0 vs expected
footer=-1071082520 (resource:
NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.06/0/index/_1k4x.nvd"))]]]]
[2014-08-26 15:01:18,684][WARN ][cluster.action.shard ] [log03 /
Salvador Dali] [eventlog-2014.07][0] received shard failed for
[eventlog-2014.07][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING],
indexUUID [T4tTXkPjTaCdSVNTjHfOcg], reason [engine failure, message
[corrupted preexisting index][CorruptIndexException[[eventlog-2014.07][0]
Corrupted index [corrupted_OzfNRRGyTIq8a1PRhLYG2w] caused by:
CorruptIndexException[codec footer mismatch: actual footer=0 vs expected
footer=-1071082520 (resource:
NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.07/0/index/_rqf.nvd"))]]]]

Thanks,

David

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/52c4fa13-32aa-4f60-bda9-c8e999ee0d2d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jakub_Podeszwik · November 30, 2014, 11:00am

I've had similar problems. Two things that helped:

If index had more than one shard then optimizing it to one shard usually
worked.
In other case manually copying shard files from node with master shard
to one of nodes that kept failing.

On Sunday, 30 November 2014 00:57:02 UTC+1, David Kleiner wrote:

Hello Mehmet,

For two indices with problematic shards (symptoms: shard is recovering,
recovery stops and recovery is attempted on a different node), I manually
changed replica count to 1 then 2. With a big index (over 90G, I think), I
was never able to recover dual replica set, thankfully it was OK to drop
it. Upgrading to more recent ES version helped too.

HTH,

David

On Saturday, November 29, 2014 2:48:45 AM UTC-8, Mehmet Cem Güntürkün
wrote:

Hey David, I have same problem now. Have you found a solution for that
problem?

26 Ağustos 2014 Salı 23:08:55 UTC+3 tarihinde David Kleiner yazdı:

Hello,

In the past couple of days I've been getting a lot of error messages
about corrupted replica shards. The primary shards come up fast after ES
process restart but replicas take a long time to come back. Sometimes it
takes a few node restarts to 'kick' the nodes to start replica shards.

ES version is 1.3.1 running on CentOS 6.5 hosted at Softlayer. It's a
3-way cluster with 4 logstash feeders hanging off it.

Here are the errors;

[2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 /
Salvador Dali] [downloader-2014.08][4] received shard failed for
[downloader-2014.08][4], node[l9-BQTHSSF-ElhgpPBZ24w], [R],
s[INITIALIZING], indexUUID [2vRrb5YlQP6MTVr1chOezg], reason [engine
failure, message [corrupted preexisting
index][CorruptIndexException[[downloader-2014.08][4] Corrupted index
[corrupted_SkU0-ZHZRxivSnGczABb_g] caused by: CorruptIndexException[codec
footer mismatch: actual footer=-1676705023 vs expected footer=-1071082520
(resource:
NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/downloader-2014.08/4/index/_k9a_es090_0.doc"))]]]]
[2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 /
Salvador Dali] [eventlog-2014.06][0] received shard failed for
[eventlog-2014.06][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING],
indexUUID [jbvChdRrRB6HTutxPvxMmQ], reason [engine failure, message
[corrupted preexisting index][CorruptIndexException[[eventlog-2014.06][0]
Corrupted index [corrupted__712QIBQQqafzpBoQwZtcg] caused by:
CorruptIndexException[codec footer mismatch: actual footer=0 vs expected
footer=-1071082520 (resource:
NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.06/0/index/_1k4x.nvd"))]]]]
[2014-08-26 15:01:18,684][WARN ][cluster.action.shard ] [log03 /
Salvador Dali] [eventlog-2014.07][0] received shard failed for
[eventlog-2014.07][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING],
indexUUID [T4tTXkPjTaCdSVNTjHfOcg], reason [engine failure, message
[corrupted preexisting index][CorruptIndexException[[eventlog-2014.07][0]
Corrupted index [corrupted_OzfNRRGyTIq8a1PRhLYG2w] caused by:
CorruptIndexException[codec footer mismatch: actual footer=0 vs expected
footer=-1071082520 (resource:
NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.07/0/index/_rqf.nvd"))]]]]

Thanks,

David

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53898508-c45d-4908-a93f-a383941ff61e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jakub_Podeszwik · November 30, 2014, 11:05am

Small mistake. 1. should be:

If shard had more than one segment then optimizing it to one segment
usually worked.

On Sunday, 30 November 2014 12:00:37 UTC+1, Jakub Podeszwik wrote:

I've had similar problems. Two things that helped:

If index had more than one shard then optimizing it to one shard
usually worked.

In other case manually copying shard files from node with master shard
to one of nodes that kept failing.

On Sunday, 30 November 2014 00:57:02 UTC+1, David Kleiner wrote:

Hello Mehmet,

For two indices with problematic shards (symptoms: shard is recovering,
recovery stops and recovery is attempted on a different node), I manually
changed replica count to 1 then 2. With a big index (over 90G, I think), I
was never able to recover dual replica set, thankfully it was OK to drop
it. Upgrading to more recent ES version helped too.

HTH,

David

On Saturday, November 29, 2014 2:48:45 AM UTC-8, Mehmet Cem Güntürkün
wrote:

Hey David, I have same problem now. Have you found a solution for that
problem?

26 Ağustos 2014 Salı 23:08:55 UTC+3 tarihinde David Kleiner yazdı:

Hello,

In the past couple of days I've been getting a lot of error messages
about corrupted replica shards. The primary shards come up fast after ES
process restart but replicas take a long time to come back. Sometimes it
takes a few node restarts to 'kick' the nodes to start replica shards.

ES version is 1.3.1 running on CentOS 6.5 hosted at Softlayer. It's a
3-way cluster with 4 logstash feeders hanging off it.

Here are the errors;

[2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 /
Salvador Dali] [downloader-2014.08][4] received shard failed for
[downloader-2014.08][4], node[l9-BQTHSSF-ElhgpPBZ24w], [R],
s[INITIALIZING], indexUUID [2vRrb5YlQP6MTVr1chOezg], reason [engine
failure, message [corrupted preexisting
index][CorruptIndexException[[downloader-2014.08][4] Corrupted index
[corrupted_SkU0-ZHZRxivSnGczABb_g] caused by: CorruptIndexException[codec
footer mismatch: actual footer=-1676705023 vs expected footer=-1071082520
(resource:
NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/downloader-2014.08/4/index/_k9a_es090_0.doc"))]]]]
[2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 /
Salvador Dali] [eventlog-2014.06][0] received shard failed for
[eventlog-2014.06][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING],
indexUUID [jbvChdRrRB6HTutxPvxMmQ], reason [engine failure, message
[corrupted preexisting index][CorruptIndexException[[eventlog-2014.06][0]
Corrupted index [corrupted__712QIBQQqafzpBoQwZtcg] caused by:
CorruptIndexException[codec footer mismatch: actual footer=0 vs expected
footer=-1071082520 (resource:
NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.06/0/index/_1k4x.nvd"))]]]]
[2014-08-26 15:01:18,684][WARN ][cluster.action.shard ] [log03 /
Salvador Dali] [eventlog-2014.07][0] received shard failed for
[eventlog-2014.07][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING],
indexUUID [T4tTXkPjTaCdSVNTjHfOcg], reason [engine failure, message
[corrupted preexisting index][CorruptIndexException[[eventlog-2014.07][0]
Corrupted index [corrupted_OzfNRRGyTIq8a1PRhLYG2w] caused by:
CorruptIndexException[codec footer mismatch: actual footer=0 vs expected
footer=-1071082520 (resource:
NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.07/0/index/_rqf.nvd"))]]]]

Thanks,

David

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bef48895-f1ec-41d3-9f3c-6009723f103b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Frequent shard failures Elasticsearch	7	690	July 20, 2023
ES - Index/Shard Recovery/ CorruptIndexException - footer mismatch Elasticsearch	3	977	July 5, 2017
CorruptIndexException when trying to replicate one shard of a new index Elasticsearch	4	1400	July 6, 2017
Corrupted Index Elasticsearch	1	480	July 6, 2017
Index corruption on cluster restart Elasticsearch	3	1315	July 6, 2017

Failing Replica Shards

Related topics