Elasticsearch version upgrade issue -- CorruptIndexException

Hi All,

I'm working on an ES upgrade from v0.20.5 to v1.2.1
I tested in a 2 node cluster, 3 indices, ~4 million docs, 18G file sizes,
20 shards, 1 replicas
However, after bumping the version and reboot the cluster, I kept on seeing
some shards are damaged. The ES log said:
Caused by: org.apache.lucene.index.CorruptIndexException: did not read all
bytes from file: read 451 vs size 452 (resource:
BufferedChecksumIndexInput(MMapIndexInput(path="/18/index/_195c_i.del")))

This badly blocked the version upgrade in my case.
Could you any one point me the reason of this issue?
Lots appreciate to your help!

Wei

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e315ae9e-5c4a-43ab-a48f-3201dc52c6c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

We did not run into this issue when we upgraded from 20.6 to 1.3.1, but
from looking at the upgrade docs, we did a few things to try and protect
any index corruption, which looks like what you ran into.

1 - we stopped any apps from writing to the indexes when we started our
upgrade
2 - we flushed the cluster before bringing it down
3 - we disabled shard allocation/replication before bringing the cluster
down (just to make sure all nodes brought back the indexes that were on the
same machines.)
4 - when we brought everything back up, we ran optimize on each index.
This was noted as a task to do, because all of the indexes are in new
formats in the newer releases, so, it was recommend to run the optimize
which would recreate all of the indexes. It was unclear whether older
indexes would really work in upgraded cluster. we did not take the chance
and took the time hit to run optimize.
5 - re-enabled shard/replication allocations
6 - cluster was working just fine

hope our steps help you try and redo the cluster upgrade.

On Tuesday, September 9, 2014 4:18:54 PM UTC-7, Wei wrote:

Hi All,

I'm working on an ES upgrade from v0.20.5 to v1.2.1
I tested in a 2 node cluster, 3 indices, ~4 million docs, 18G file sizes,
20 shards, 1 replicas
However, after bumping the version and reboot the cluster, I kept on
seeing some shards are damaged. The ES log said:
Caused by: org.apache.lucene.index.CorruptIndexException: did not read all
bytes from file: read 451 vs size 452 (resource:
BufferedChecksumIndexInput(MMapIndexInput(path="/18/index/_195c_i.del")))

This badly blocked the version upgrade in my case.
Could you any one point me the reason of this issue?
Lots appreciate to your help!

Wei

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cb341c55-2a15-465e-81ae-463ffb523355%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

This is a bug in lucene: [LUCENE-5975] Lucene can't read 3.0-3.3 deleted documents - ASF JIRA

Sorry it took a while, thanks for reporting this!

On Tue, Sep 9, 2014 at 7:18 PM, Wei wshen@groupon.com wrote:

Hi All,

I'm working on an ES upgrade from v0.20.5 to v1.2.1
I tested in a 2 node cluster, 3 indices, ~4 million docs, 18G file sizes, 20
shards, 1 replicas
However, after bumping the version and reboot the cluster, I kept on seeing
some shards are damaged. The ES log said:
Caused by: org.apache.lucene.index.CorruptIndexException: did not read all
bytes from file: read 451 vs size 452 (resource:
BufferedChecksumIndexInput(MMapIndexInput(path="/18/index/_195c_i.del")))

This badly blocked the version upgrade in my case.
Could you any one point me the reason of this issue?
Lots appreciate to your help!

Wei

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e315ae9e-5c4a-43ab-a48f-3201dc52c6c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMUKNZWZgg0RiaizYGaUyC2kWAXFAfSqM2knb7z5bkwNeth1HQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.