Error: CorruptIndexException when reading from gateway

see attached files:
http://elasticsearch-users.115913.n3.nabble.com/file/n861253/info.zip

Hi,
I am testing the behavior of elastic search under large scale:
in my setup there 2 64-bit nodes with 8 CPUs each.
they running version 0.71 of ES (as a service), and using an index
gateway
in fs mode (NFS)
I have a single index with 5 shards each shard has 2 replica (5/1) so
I have
10 shards in total.
I have indexed 8.5 Million documents
here is the disk usage of all shards and transLog in the gateway
6.0G ./0/index
4.4M ./0/translog
6.2G ./1/index
2.3M ./1/translog
6.4G ./2/index
2.8M ./2/translog
6.9G ./3/index
1.4M ./3/translog
6.5G ./4/index
2.6M ./4/translog
32G .

I then did the following things

  1. stopped the indexing process
  2. stopped one of the es nodes
  3. waited about 3 minutes
  4. stopped the other node
  5. restarted the first node
  6. queried for the number of docs: curl -XGET
    'http://localhost:9200/en/_count?q=:'

I noticed the load on the machine was high (11-15, even now 30 minutes
after
the restart)
at first I got zero results
later after maybe 10 minutes I saw exceptions in the log ()see below)
and I got only 6.5 M docs - one shard is corrupted

I got a CorruptIndexException in the log file

Attached also :
http://elasticsearch-users.115913.n3.nabble.com/file/n861253/info.zip
info.zip the cluster health, state info, nodes info and the full log
file

07:23:33,508][WARN ][indices.cluster ] [Leeds, Betty Brant]
Failed to start shard for index [en] and shard id [3]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[en][3] Failed to perform recovery of translog
at
org.elasticsearch.index.gateway.fs.FsIndexShardGateway.recoverTranslog(FsIndexShardGateway.java:
381)
at
org.elasticsearch.index.gateway.fs.FsIndexShardGateway.recover(FsIndexShardGateway.java:
111)
at
org.elasticsearch.index.gateway.IndexShardGatewayService.recover(IndexShardGatewayService.java:
133)
at org.elasticsearch.indices.cluster.IndicesClusterStateService
$3.run(IndicesClusterStateService.java:342)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by:
org.elasticsearch.index.engine.EngineCreationFailureException: [en][3]
Failed to open reader on writer
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:
166)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecovery(InternalIndexShard.java:
407)
at
org.elasticsearch.index.gateway.fs.FsIndexShardGateway.recoverTranslog(FsIndexShardGateway.java:
378)
... 6 more
Caused by: org.apache.lucene.index.CorruptIndexException: doc counts
differ for segment _1mfa: fieldsReader shows 334 but segmentInfo shows
3215
at org.apache.lucene.index.SegmentReader
$CoreReaders.openDocStores(SegmentReader.java:282)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:578)
at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:609)
at org.apache.lucene.index.IndexWriter
$ReaderPool.getReadOnlyClone(IndexWriter.java:568)
at
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:
150)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:
36)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:
405)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:
372)
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:
150)
... 8 more

First, in elasticsearch lingo, 5 shards with 1 replica each equals 10.

When a fresh node starts up, it will recover its state from the gateway.
This means copying the relevant index files and transaction log for the
relevant shards from it. In FS index storage with FS gateway, this means
copying over the files, and applying the content of the transaction log.
While this happens, if you query the cluster, you will get failures (check
the shard failures in the response) saying the shards are not ready yet.

This might take time, depending on your NFS, how big the index is and so on.
I use native file copy when doing FS -> FS, which should be the fastest way
to do it, but still, it might take time. In order to know how long it took,
what was spent on each state, and so on, you should set the 'index.gateway'
logging to 'DEBUG' (right under 'action', same tabbing). Enable it to get a
feeling how long these things take.

The corrupt index probably got from not copying over the files properly.
This might have to do with the NFS configuration. Can you post it? (clinton,
if you are out there: can you share your NFS configuration?, it seems to
work well for you).

-shay.banon

On Tue, Jun 1, 2010 at 4:56 PM, Yatir Ben Shlomo yatirb@gmail.com wrote:

see attached files:
http://elasticsearch-users.115913.n3.nabble.com/file/n861253/info.zip

Hi,
I am testing the behavior of Elasticsearch under large scale:
in my setup there 2 64-bit nodes with 8 CPUs each.
they running version 0.71 of ES (as a service), and using an index
gateway
in fs mode (NFS)
I have a single index with 5 shards each shard has 2 replica (5/1) so
I have
10 shards in total.
I have indexed 8.5 Million documents
here is the disk usage of all shards and transLog in the gateway
6.0G ./0/index
4.4M ./0/translog
6.2G ./1/index
2.3M ./1/translog
6.4G ./2/index
2.8M ./2/translog
6.9G ./3/index
1.4M ./3/translog
6.5G ./4/index
2.6M ./4/translog
32G .

I then did the following things

  1. stopped the indexing process
  2. stopped one of the es nodes
  3. waited about 3 minutes
  4. stopped the other node
  5. restarted the first node
  6. queried for the number of docs: curl -XGET
    'http://localhost:9200/en/_count?q=*:*'

I noticed the load on the machine was high (11-15, even now 30 minutes
after
the restart)
at first I got zero results
later after maybe 10 minutes I saw exceptions in the log ()see below)
and I got only 6.5 M docs - one shard is corrupted

I got a CorruptIndexException in the log file

Attached also :
http://elasticsearch-users.115913.n3.nabble.com/file/n861253/info.zip
info.zip the cluster health, state info, nodes info and the full log
file

07:23:33,508][WARN ][indices.cluster ] [Leeds, Betty Brant]
Failed to start shard for index [en] and shard id [3]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[en][3] Failed to perform recovery of translog
at

org.elasticsearch.index.gateway.fs.FsIndexShardGateway.recoverTranslog(FsIndexShardGateway.java:
381)
at

org.elasticsearch.index.gateway.fs.FsIndexShardGateway.recover(FsIndexShardGateway.java:
111)
at

org.elasticsearch.index.gateway.IndexShardGatewayService.recover(IndexShardGatewayService.java:
133)
at org.elasticsearch.indices.cluster.IndicesClusterStateService
$3.run(IndicesClusterStateService.java:342)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by:
org.elasticsearch.index.engine.EngineCreationFailureException: [en][3]
Failed to open reader on writer
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:
166)
at

org.elasticsearch.index.shard.service.InternalIndexShard.performRecovery(InternalIndexShard.java:
407)
at

org.elasticsearch.index.gateway.fs.FsIndexShardGateway.recoverTranslog(FsIndexShardGateway.java:
378)
... 6 more
Caused by: org.apache.lucene.index.CorruptIndexException: doc counts
differ for segment _1mfa: fieldsReader shows 334 but segmentInfo shows
3215
at org.apache.lucene.index.SegmentReader
$CoreReaders.openDocStores(SegmentReader.java:282)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:578)
at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:609)
at org.apache.lucene.index.IndexWriter
$ReaderPool.getReadOnlyClone(IndexWriter.java:568)
at
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:
150)
at

org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:
36)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:
405)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:
372)
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:
150)
... 8 more

Hi Shay,
Here is the NFS configuration:
This is the mount command on the client: (from fstab)
obnas:/obstore/gateway /outbrain/elasticsearch/gateway nfs
rsize=8192,wsize=8192,timeo=14,intr

Please tell me if you need more specific details
thanks
yatir

On Jun 1, 11:36 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

First, in elasticsearch lingo, 5 shards with 1 replica each equals 10.

When a fresh node starts up, it will recover its state from the gateway.
This means copying the relevant index files and transaction log for the
relevant shards from it. In FS index storage with FS gateway, this means
copying over the files, and applying the content of the transaction log.
While this happens, if you query the cluster, you will get failures (check
the shard failures in the response) saying the shards are not ready yet.

This might take time, depending on your NFS, how big the index is and so on.
I use native file copy when doing FS -> FS, which should be the fastest way
to do it, but still, it might take time. In order to know how long it took,
what was spent on each state, and so on, you should set the 'index.gateway'
logging to 'DEBUG' (right under 'action', same tabbing). Enable it to get a
feeling how long these things take.

The corrupt index probably got from not copying over the files properly.
This might have to do with the NFS configuration. Can you post it? (clinton,
if you are out there: can you share your NFS configuration?, it seems to
work well for you).

-shay.banon

On Tue, Jun 1, 2010 at 4:56 PM, Yatir Ben Shlomo yat...@gmail.com wrote:

see attached files:
http://elasticsearch-users.115913.n3.nabble.com/file/n861253/info.zip

Hi,
I am testing the behavior of Elasticsearch under large scale:
in my setup there 2 64-bit nodes with 8 CPUs each.
they running version 0.71 of ES (as a service), and using an index
gateway
in fs mode (NFS)
I have a single index with 5 shards each shard has 2 replica (5/1) so
I have
10 shards in total.
I have indexed 8.5 Million documents
here is the disk usage of all shards and transLog in the gateway
6.0G ./0/index
4.4M ./0/translog
6.2G ./1/index
2.3M ./1/translog
6.4G ./2/index
2.8M ./2/translog
6.9G ./3/index
1.4M ./3/translog
6.5G ./4/index
2.6M ./4/translog
32G .

I then did the following things

  1. stopped the indexing process
  2. stopped one of the es nodes
  3. waited about 3 minutes
  4. stopped the other node
  5. restarted the first node
  6. queried for the number of docs: curl -XGET
    'http://localhost:9200/en/_count?q=*:*'

I noticed the load on the machine was high (11-15, even now 30 minutes
after
the restart)
at first I got zero results
later after maybe 10 minutes I saw exceptions in the log ()see below)
and I got only 6.5 M docs - one shard is corrupted

I got a CorruptIndexException in the log file

Attached also :
http://elasticsearch-users.115913.n3.nabble.com/file/n861253/info.zip
info.zip the cluster health, state info, nodes info and the full log
file

07:23:33,508][WARN ][indices.cluster ] [Leeds, Betty Brant]
Failed to start shard for index [en] and shard id [3]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[en][3] Failed to perform recovery of translog
at

org.elasticsearch.index.gateway.fs.FsIndexShardGateway.recoverTranslog(FsIndexShardGateway.java:
381)
at

org.elasticsearch.index.gateway.fs.FsIndexShardGateway.recover(FsIndexShardGateway.java:
111)
at

org.elasticsearch.index.gateway.IndexShardGatewayService.recover(IndexShardGatewayService.java:
133)
at org.elasticsearch.indices.cluster.IndicesClusterStateService
$3.run(IndicesClusterStateService.java:342)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by:
org.elasticsearch.index.engine.EngineCreationFailureException: [en][3]
Failed to open reader on writer
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:
166)
at

org.elasticsearch.index.shard.service.InternalIndexShard.performRecovery(InternalIndexShard.java:
407)
at

org.elasticsearch.index.gateway.fs.FsIndexShardGateway.recoverTranslog(FsIndexShardGateway.java:
378)
... 6 more
Caused by: org.apache.lucene.index.CorruptIndexException: doc counts
differ for segment _1mfa: fieldsReader shows 334 but segmentInfo shows
3215
at org.apache.lucene.index.SegmentReader
$CoreReaders.openDocStores(SegmentReader.java:282)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:578)
at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:609)
at org.apache.lucene.index.IndexWriter
$ReaderPool.getReadOnlyClone(IndexWriter.java:568)
at
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:
150)
at

org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:
36)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:
405)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:
372)
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:
150)
... 8 more

Hiya

The corrupt index probably got from not copying over the files
properly. This might have to do with the NFS configuration. Can you
post it? (clinton, if you are out there: can you share your NFS
configuration?, it seems to work well for you).

Actually, I'm using the default settings for mounting the NFS partition

  • nothing special there.

However, the NFS server needs to perform reasonably well - you shouldn't
use a cheap box for it. Especially during recovery or heavy indexing, a
slow box can cause delays.

Also, I'm setting some values in /etc/sysctl.conf which jgroups
recommended - not sure if it is still relevant with zen:

net.core.wmem_max = 655360
net.core.rmem_max = 26214400

And the only other thing I'm setting is the number of open files
allowed:

ulimit -n 20000

clint

--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.

Actually, I'm using the default settings for mounting the NFS partition

  • nothing special there.

So in /etc/exports on the NFS server I have:

/opt/es_data 192.168.10.33(rw,root_squash,sync,no_subtree_check)

And in /etc/fstab on the client I have:

data1:/opt/es_data /opt/es_data nfs rw 0 0

clint

Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.