Problem with recovery after power failure


(thinusp) #1

I am having serious problems with recovering my index after a power
failure. The setup I have is as follows:

  • 1 master-only node
  • 4 data-only node
  • Indexed around 20mill documents, over 8 shards with replication factor 2.

After power-failure, I make sure to start the master node first.
Thereafter, I start all the data nodes as quickly as possible. When using
the HEAD plugin I can view the status as RED, and it literally takes hours
to recover. The document count will have to be much higher that 20 million
before I can take this into production.

I have ensured that the document limit (ulimit -n) is 32000 on all
machines, and I can have ElasticSearch verify this for me. When viewing
the logs, I get the following message repeatedly thrown on all of the data
nodes:

[2012-04-04 17:27:22,003][WARN ][indices.cluster ] [ClusterPC2]
[myindex][6] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[index][6] failed recovery
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:228)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException:
[myindex][6] Failed to open reader on writer
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:279)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryPrepareForTranslog(InternalIndexShard.java:579)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:175)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
... 3 more
Caused by: java.io.FileNotFoundException:
/data1/elasticsearch/elasticsearchcluster/nodes/0/indices/myindex/6/index/_nyo.tii
(Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:70)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:97)
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:92)
at
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:79)
at
org.elasticsearch.index.store.Store$StoreDirectory.openInput(Store.java:458)
at
org.apache.lucene.index.TermInfosReader.(TermInfosReader.java:113)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:76)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
at
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:705)
at
org.apache.lucene.index.IndexWriter$ReaderPool.getReadOnlyClone(IndexWriter.java:663)
at
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:157)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:38)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:453)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:401)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:345)
at
org.elasticsearch.index.engine.robin.RobinEngine.buildNrtResource(RobinEngine.java:1365)
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:263)
... 6 more

I litterally get thousands of these entries in the logs, several Gigs worth
if I don't stop the process. On the master PC I get this message:

[2012-04-04 17:29:31,613][WARN ][cluster.action.shard ] [ClusterPC2]
received shard failed for [myindex][5], node[ufvdnrd-QMm0eSEBNowUag], [P],
s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[myindex][5] failed recovery]; nested:
EngineCreationFailureException[[myindex][5] Failed to open reader on
writer]; nested:
FileNotFoundException[/data2/elasticsearch/elasticsearchcluster/nodes/0/indices/myindex/5/index/_91x.prx
(Too many open files)]; ]]

To solve the problem, I have to restart all the data node services (I use
the service wrapper), so I enter "service elasticsearch restart" on all my
data node PC's. Very soon after this, the HEAD plugin shows status as
yellow, and shards are being initialised and copied. This process still
takes several hours though, and I suspect that's not normal. Is it
possible that through that initial frenzy when the "Files could not be
found" the structure somehow got corrupted? Each shard is about 16GB in
size at the moment, though it will surely grow quite a bit still.

Any advice? Hopefully I'm doing something wrong here...

Thanks, Thinus


(Shay Banon) #2

It seems like when you started the process, it did not have the correct
file limit descriptors set, and thats why it was in red start (it tried to
start a shard, got the "too many open files", and failed to do it).

Once you get to yellow status (when you fixed hte file descriptors
problem), things are good, meaning all the primary shards are allocated.
Once its done, replicas are being allocated, some may need to resync with
the primary which might take time (in the future, we possibly will be able
to make this shorter). There is also throttling going on as to not overload
the network / nodes with too many recoveries going on at once.

On Wed, Apr 4, 2012 at 6:37 PM, Thinus Prinsloo
thinus.prinsloo@gmail.comwrote:

I am having serious problems with recovering my index after a power
failure. The setup I have is as follows:

  • 1 master-only node
  • 4 data-only node
  • Indexed around 20mill documents, over 8 shards with replication factor 2.

After power-failure, I make sure to start the master node first.
Thereafter, I start all the data nodes as quickly as possible. When using
the HEAD plugin I can view the status as RED, and it literally takes hours
to recover. The document count will have to be much higher that 20 million
before I can take this into production.

I have ensured that the document limit (ulimit -n) is 32000 on all
machines, and I can have ElasticSearch verify this for me. When viewing
the logs, I get the following message repeatedly thrown on all of the data
nodes:

[2012-04-04 17:27:22,003][WARN ][indices.cluster ] [ClusterPC2]
[myindex][6] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[index][6] failed recovery
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:228)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException:
[myindex][6] Failed to open reader on writer
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:279)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryPrepareForTranslog(InternalIndexShard.java:579)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:175)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
... 3 more
Caused by: java.io.FileNotFoundException:
/data1/elasticsearch/elasticsearchcluster/nodes/0/indices/myindex/6/index/_nyo.tii
(Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:70)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:97)
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:92)
at
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:79)
at
org.elasticsearch.index.store.Store$StoreDirectory.openInput(Store.java:458)
at
org.apache.lucene.index.TermInfosReader.(TermInfosReader.java:113)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:76)
at
org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
at
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:705)
at
org.apache.lucene.index.IndexWriter$ReaderPool.getReadOnlyClone(IndexWriter.java:663)
at
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:157)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:38)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:453)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:401)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:345)
at
org.elasticsearch.index.engine.robin.RobinEngine.buildNrtResource(RobinEngine.java:1365)
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:263)
... 6 more

I litterally get thousands of these entries in the logs, several Gigs
worth if I don't stop the process. On the master PC I get this message:

[2012-04-04 17:29:31,613][WARN ][cluster.action.shard ] [ClusterPC2]
received shard failed for [myindex][5], node[ufvdnrd-QMm0eSEBNowUag], [P],
s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[myindex][5] failed recovery]; nested:
EngineCreationFailureException[[myindex][5] Failed to open reader on
writer]; nested:
FileNotFoundException[/data2/elasticsearch/elasticsearchcluster/nodes/0/indices/myindex/5/index/_91x.prx
(Too many open files)]; ]]

To solve the problem, I have to restart all the data node services (I use
the service wrapper), so I enter "service elasticsearch restart" on all
my data node PC's. Very soon after this, the HEAD plugin shows status as
yellow, and shards are being initialised and copied. This process still
takes several hours though, and I suspect that's not normal. Is it
possible that through that initial frenzy when the "Files could not be
found" the structure somehow got corrupted? Each shard is about 16GB in
size at the moment, though it will surely grow quite a bit still.

Any advice? Hopefully I'm doing something wrong here...

Thanks, Thinus


(thinusp) #3

Thanks Shay,

I realise that this seems to be the case, but as far as I can tell the
limits is not the problem, as I've set them as part of the system
configuration (in /etc/security/limit.conf). So as far as I understand, by
the time the service starts this should be configured as such already? Or
am I mistaken?

So just to be clear, the limits problem I've solved a couple days ago
already, and the server has been restarted quite a few times since then,
with indexing happening and everything in between. It is not the case that
I noticed the problem with the exceptions above, set the limits
accordingly, and only then saw the yellow status. The limits have always
been like this.

Thanks again.

PS. Running CentOS 6.2.

On Wednesday, 4 April 2012 17:43:20 UTC+2, kimchy wrote:

It seems like when you started the process, it did not have the correct
file limit descriptors set, and thats why it was in red start (it tried to
start a shard, got the "too many open files", and failed to do it).

Once you get to yellow status (when you fixed hte file descriptors
problem), things are good, meaning all the primary shards are allocated.
Once its done, replicas are being allocated, some may need to resync with
the primary which might take time (in the future, we possibly will be able
to make this shorter). There is also throttling going on as to not overload
the network / nodes with too many recoveries going on at once.

On Wed, Apr 4, 2012 at 6:37 PM, Thinus Prinsloo wrote:

I am having serious problems with recovering my index after a power
failure. The setup I have is as follows:

  • 1 master-only node
  • 4 data-only node
  • Indexed around 20mill documents, over 8 shards with replication factor

After power-failure, I make sure to start the master node first.
Thereafter, I start all the data nodes as quickly as possible. When using
the HEAD plugin I can view the status as RED, and it literally takes hours
to recover. The document count will have to be much higher that 20 million
before I can take this into production.

I have ensured that the document limit (ulimit -n) is 32000 on all
machines, and I can have ElasticSearch verify this for me. When viewing
the logs, I get the following message repeatedly thrown on all of the data
nodes:

[2012-04-04 17:27:22,003][WARN ][indices.cluster ] [ClusterPC2]
[myindex][6] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[index][6] failed recovery
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:228)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException:
[myindex][6] Failed to open reader on writer
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:279)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryPrepareForTranslog(InternalIndexShard.java:579)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:175)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
... 3 more
Caused by: java.io.FileNotFoundException:
/data1/elasticsearch/elasticsearchcluster/nodes/0/indices/myindex/6/index/_nyo.tii
(Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:70)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:97)
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:92)
at
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:79)
at
org.elasticsearch.index.store.Store$StoreDirectory.openInput(Store.java:458)
at
org.apache.lucene.index.TermInfosReader.(TermInfosReader.java:113)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:76)
at
org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
at
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:705)
at
org.apache.lucene.index.IndexWriter$ReaderPool.getReadOnlyClone(IndexWriter.java:663)
at
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:157)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:38)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:453)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:401)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:345)
at
org.elasticsearch.index.engine.robin.RobinEngine.buildNrtResource(RobinEngine.java:1365)
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:263)
... 6 more

I litterally get thousands of these entries in the logs, several Gigs
worth if I don't stop the process. On the master PC I get this message:

[2012-04-04 17:29:31,613][WARN ][cluster.action.shard ] [ClusterPC2]
received shard failed for [myindex][5], node[ufvdnrd-QMm0eSEBNowUag], [P],
s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[myindex][5] failed recovery]; nested:
EngineCreationFailureException[[myindex][5] Failed to open reader on
writer]; nested:
FileNotFoundException[/data2/elasticsearch/elasticsearchcluster/nodes/0/indices/myindex/5/index/_91x.prx
(Too many open files)]; ]]

To solve the problem, I have to restart all the data node services (I use
the service wrapper), so I enter "service elasticsearch restart" on all
my data node PC's. Very soon after this, the HEAD plugin shows status as
yellow, and shards are being initialised and copied. This process still
takes several hours though, and I suspect that's not normal. Is it
possible that through that initial frenzy when the "Files could not be
found" the structure somehow got corrupted? Each shard is about 16GB in
size at the moment, though it will surely grow quite a bit still.

Any advice? Hopefully I'm doing something wrong here...

Thanks, Thinus


(Shay Banon) #4

Its simple to test it, use nodes info API to see if it has been applied or
not, it gives you the max file descriptors the process was started with. I
don't think it was applied properly on the first run where you got the
failures.

On Wed, Apr 4, 2012 at 6:50 PM, Thinus Prinsloo
thinus.prinsloo@gmail.comwrote:

Thanks Shay,

I realise that this seems to be the case, but as far as I can tell the
limits is not the problem, as I've set them as part of the system
configuration (in /etc/security/limit.conf). So as far as I understand,
by the time the service starts this should be configured as such already?
Or am I mistaken?

So just to be clear, the limits problem I've solved a couple days ago
already, and the server has been restarted quite a few times since then,
with indexing happening and everything in between. It is not the case that
I noticed the problem with the exceptions above, set the limits
accordingly, and only then saw the yellow status. The limits have always
been like this.

Thanks again.

PS. Running CentOS 6.2.

On Wednesday, 4 April 2012 17:43:20 UTC+2, kimchy wrote:

It seems like when you started the process, it did not have the correct
file limit descriptors set, and thats why it was in red start (it tried to
start a shard, got the "too many open files", and failed to do it).

Once you get to yellow status (when you fixed hte file descriptors
problem), things are good, meaning all the primary shards are allocated.
Once its done, replicas are being allocated, some may need to resync with
the primary which might take time (in the future, we possibly will be able
to make this shorter). There is also throttling going on as to not overload
the network / nodes with too many recoveries going on at once.

On Wed, Apr 4, 2012 at 6:37 PM, Thinus Prinsloo wrote:

I am having serious problems with recovering my index after a power
failure. The setup I have is as follows:

  • 1 master-only node
  • 4 data-only node
  • Indexed around 20mill documents, over 8 shards with replication factor

After power-failure, I make sure to start the master node first.
Thereafter, I start all the data nodes as quickly as possible. When using
the HEAD plugin I can view the status as RED, and it literally takes hours
to recover. The document count will have to be much higher that 20 million
before I can take this into production.

I have ensured that the document limit (ulimit -n) is 32000 on all
machines, and I can have ElasticSearch verify this for me. When viewing
the logs, I get the following message repeatedly thrown on all of the data
nodes:

[2012-04-04 17:27:22,003][WARN ][indices.cluster ] [ClusterPC2]
[myindex][6] failed to start shard
org.elasticsearch.index.**gateway.IndexShardGatewayRecoveryException:
[index][6] failed recovery
at org.elasticsearch.index.gateway.
IndexShardGatewayService$1.**run(IndexShardGatewayService.java:228)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: org.elasticsearch.index.engine.
EngineCreationFailureException
: [myindex][6] Failed to open reader on
writer
at org.elasticsearch.index.engine.robin.RobinEngine.
start(RobinEngine.java:279)
at org.elasticsearch.index.shard.service.InternalIndexShard.
performRecoveryPrepareForTrans
log(InternalIndexShard.java:**579)
at org.elasticsearch.index.gateway.local.
LocalIndexShardGateway.**recover(**LocalIndexShardGateway.java:**175)
at org.elasticsearch.index.gateway.
IndexShardGatewayService$1.**run(IndexShardGatewayService.java:177)
... 3 more
Caused by: java.io.FileNotFoundException: /data1/elasticsearch/

elasticsearchcluster/nodes/0/**indices/myindex/6/index/_nyo.**tii (Too
many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:
233)
at org.apache.lucene.store.SimpleFSDirectory$
SimpleFSIndexInput$Descriptor.
(SimpleFSDirectory.java:**70)
at org.apache.lucene.store.SimpleFSDirectory$
SimpleFSIndexInput.(**SimpleFSDirectory.java:97)
at org.apache.lucene.store.NIOFSDirectory$
NIOFSIndexInput.(**NIOFSDirectory.java:92)
at org.apache.lucene.store.NIOFSDirectory.openInput(
NIOFSDirectory.java:79)
at org.elasticsearch.index.store.Store$StoreDirectory.
openInput(Store.java:458)
at org.apache.lucene.index.TermInfosReader.(
TermInfosReader.java:113)
at org.apache.lucene.index.SegmentCoreReaders.(
SegmentCoreReaders.java:76)
at org.apache.lucene.index.SegmentReader.get(
SegmentReader.java:115)
at org.apache.lucene.index.IndexWriter$ReaderPool.get(
IndexWriter.java:705)
at org.apache.lucene.index.IndexWriter$ReaderPool.
getReadOnlyClone(IndexWriter.**java:663)
at org.apache.lucene.index.DirectoryReader.(
DirectoryReader.java:157)
at org.apache.lucene.index.ReadOnlyDirectoryReader.
(ReadOnlyDirectoryReader.java:**38)
at org.apache.lucene.index.IndexWriter.getReader(
IndexWriter.java:453)
at org.apache.lucene.index.IndexWriter.getReader(
IndexWriter.java:401)
at org.apache.lucene.index.IndexReader.open(IndexReader.
java:345)
at org.elasticsearch.index.engine.robin.RobinEngine.
buildNrtResource(RobinEngine.**java:1365)
at org.elasticsearch.index.engine.robin.RobinEngine.
start(RobinEngine.java:263)
... 6 more

I litterally get thousands of these entries in the logs, several Gigs
worth if I don't stop the process. On the master PC I get this message:

[2012-04-04 17:29:31,613][WARN ][cluster.action.shard ] [ClusterPC2]
received shard failed for [myindex][5], node[ufvdnrd-QMm0eSEBNowUag], [P],
s[INITIALIZING], reason [Failed to start shard, message [**
IndexShardGatewayRecoveryException[[myindex][5] failed recovery];
nested: EngineCreationFailureException
[[myindex][5] Failed to open
reader on writer]; nested: FileNotFoundException[/data2/**elasticsearch/
**elasticsearchcluster/nodes/0/**indices/myindex/5/index/_91x.**prx
(Too many open files)]; ]]

To solve the problem, I have to restart all the data node services (I
use the service wrapper), so I enter "service elasticsearch restart" on
all my data node PC's. Very soon after this, the HEAD plugin shows status
as yellow, and shards are being initialised and copied. This process still
takes several hours though, and I suspect that's not normal. Is it
possible that through that initial frenzy when the "Files could not be
found" the structure somehow got corrupted? Each shard is about 16GB in
size at the moment, though it will surely grow quite a bit still.

Any advice? Hopefully I'm doing something wrong here...

Thanks, Thinus


(thinusp) #5

Right, so this is what I found:

It seems when the service starts at boot, the limits are not set correctly.
Then when I restart the service, it sets the correct limits. I'll have to
figure out why my limits are not configured at boot time, or at least by
the time the service starts.

Thanks for the help!

Regards,
Thinus

On Wed, Apr 4, 2012 at 5:52 PM, Shay Banon kimchy@gmail.com wrote:

Its simple to test it, use nodes info API to see if it has been applied or
not, it gives you the max file descriptors the process was started with. I
don't think it was applied properly on the first run where you got the
failures.

On Wed, Apr 4, 2012 at 6:50 PM, Thinus Prinsloo <thinus.prinsloo@gmail.com

wrote:

Thanks Shay,

I realise that this seems to be the case, but as far as I can tell the
limits is not the problem, as I've set them as part of the system
configuration (in /etc/security/limit.conf). So as far as I understand,
by the time the service starts this should be configured as such already?
Or am I mistaken?

So just to be clear, the limits problem I've solved a couple days ago
already, and the server has been restarted quite a few times since then,
with indexing happening and everything in between. It is not the case that
I noticed the problem with the exceptions above, set the limits
accordingly, and only then saw the yellow status. The limits have always
been like this.

Thanks again.

PS. Running CentOS 6.2.

On Wednesday, 4 April 2012 17:43:20 UTC+2, kimchy wrote:

It seems like when you started the process, it did not have the correct
file limit descriptors set, and thats why it was in red start (it tried to
start a shard, got the "too many open files", and failed to do it).

Once you get to yellow status (when you fixed hte file descriptors
problem), things are good, meaning all the primary shards are allocated.
Once its done, replicas are being allocated, some may need to resync with
the primary which might take time (in the future, we possibly will be able
to make this shorter). There is also throttling going on as to not overload
the network / nodes with too many recoveries going on at once.

On Wed, Apr 4, 2012 at 6:37 PM, Thinus Prinsloo wrote:

I am having serious problems with recovering my index after a power
failure. The setup I have is as follows:

  • 1 master-only node
  • 4 data-only node
  • Indexed around 20mill documents, over 8 shards with replication
    factor 2.

After power-failure, I make sure to start the master node first.
Thereafter, I start all the data nodes as quickly as possible. When using
the HEAD plugin I can view the status as RED, and it literally takes hours
to recover. The document count will have to be much higher that 20 million
before I can take this into production.

I have ensured that the document limit (ulimit -n) is 32000 on all
machines, and I can have ElasticSearch verify this for me. When viewing
the logs, I get the following message repeatedly thrown on all of the data
nodes:

[2012-04-04 17:27:22,003][WARN ][indices.cluster ]
[ClusterPC2] [myindex][6] failed to start shard
org.elasticsearch.index.**gateway.IndexShardGatewayRecoveryException:
[index][6] failed recovery
at org.elasticsearch.index.gateway.
IndexShardGatewayService$1.**run(IndexShardGatewayService.*java:228)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: org.elasticsearch.index.engine.
EngineCreationFailureException
: [myindex][6] Failed to open reader
on writer
at org.elasticsearch.index.engine.robin.RobinEngine.
start(RobinEngine.java:279)
at org.elasticsearch.index.shard.service.InternalIndexShard.
*performRecoveryPrepareForTrans
log(InternalIndexShard.java:**579)
at org.elasticsearch.index.gateway.local.
LocalIndexShardGateway.**recover(**LocalIndexShardGateway.java:**175)
at org.elasticsearch.index.gateway.
IndexShardGatewayService$1.**run(IndexShardGatewayService.java:177)
... 3 more
Caused by: java.io.FileNotFoundException: /data1/elasticsearch/

elasticsearchcluster/nodes/0/**indices/myindex/6/index/_nyo.**tii (Too
many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:
233)
at org.apache.lucene.store.SimpleFSDirectory$
SimpleFSIndexInput$Descriptor.
(SimpleFSDirectory.java:**70)
at org.apache.lucene.store.SimpleFSDirectory$
SimpleFSIndexInput.(**SimpleFSDirectory.java:97)
at org.apache.lucene.store.NIOFSDirectory$
NIOFSIndexInput.(**NIOFSDirectory.java:92)
at org.apache.lucene.store.NIOFSDirectory.openInput(
NIOFSDirectory.java:79)
at org.elasticsearch.index.store.Store$StoreDirectory.
openInput(Store.java:458)
at org.apache.lucene.index.TermInfosReader.(
TermInfosReader.java:113)
at org.apache.lucene.index.SegmentCoreReaders.(
SegmentCoreReaders.java:76)
at org.apache.lucene.index.SegmentReader.get(
SegmentReader.java:115)
at org.apache.lucene.index.IndexWriter$ReaderPool.get(
IndexWriter.java:705)
at org.apache.lucene.index.IndexWriter$ReaderPool.
getReadOnlyClone(IndexWriter.**java:663)
at org.apache.lucene.index.DirectoryReader.(
DirectoryReader.java:157)
at org.apache.lucene.index.ReadOnlyDirectoryReader.
(ReadOnlyDirectoryReader.java:**38)
at org.apache.lucene.index.IndexWriter.getReader(
IndexWriter.java:453)
at org.apache.lucene.index.IndexWriter.getReader(
IndexWriter.java:401)
at org.apache.lucene.index.IndexReader.open(IndexReader.
java:345)
at org.elasticsearch.index.engine.robin.RobinEngine.
buildNrtResource(RobinEngine.**java:1365)
at org.elasticsearch.index.engine.robin.RobinEngine.
start(RobinEngine.java:263)
... 6 more

I litterally get thousands of these entries in the logs, several Gigs
worth if I don't stop the process. On the master PC I get this message:

[2012-04-04 17:29:31,613][WARN ][cluster.action.shard ]
[ClusterPC2] received shard failed for [myindex][5],
node[ufvdnrd-QMm0eSEBNowUag], [P], s[INITIALIZING], reason [Failed to start
shard, message [IndexShardGatewayRecoveryException[[myindex][5]
failed recovery]; nested: EngineCreationFailureException**[[myindex][5]
Failed to open reader on writer]; nested: FileNotFoundException[/data2/
**elasticsearch/elasticsearchcluster/nodes/0/
indices/myindex/5/index/_91x.**prx (Too many open files)]; ]]

To solve the problem, I have to restart all the data node services (I
use the service wrapper), so I enter "service elasticsearch restart"
on all my data node PC's. Very soon after this, the HEAD plugin shows
status as yellow, and shards are being initialised and copied. This
process still takes several hours though, and I suspect that's not normal.
Is it possible that through that initial frenzy when the "Files could not
be found" the structure somehow got corrupted? Each shard is about 16GB in
size at the moment, though it will surely grow quite a bit still.

Any advice? Hopefully I'm doing something wrong here...

Thanks, Thinus

--
Thinus Prinsloo
E-mail: thinus.prinsloo@gmail.com
Cell: +27 82 339 2226


(Holger Eilhard) #6

Hi,

have you figured out how to fix this problem? We are also running some ES
boxes with CentOS 6.2 and are running into this issue as well.

Thanks in advance!

Cheers,
Holger

On Wednesday, April 4, 2012 6:10:05 PM UTC+2, Thinus Prinsloo wrote:

Right, so this is what I found:

It seems when the service starts at boot, the limits are not set
correctly. Then when I restart the service, it sets the correct limits.
I'll have to figure out why my limits are not configured at boot time, or
at least by the time the service starts.

Thanks for the help!

Regards,
Thinus

On Wed, Apr 4, 2012 at 5:52 PM, Shay Banon kimchy@gmail.com wrote:

Its simple to test it, use nodes info API to see if it has been applied
or not, it gives you the max file descriptors the process was started with.
I don't think it was applied properly on the first run where you got the
failures.

On Wed, Apr 4, 2012 at 6:50 PM, Thinus Prinsloo <
thinus.prinsloo@gmail.com> wrote:

Thanks Shay,

I realise that this seems to be the case, but as far as I can tell the
limits is not the problem, as I've set them as part of the system
configuration (in /etc/security/limit.conf). So as far as I
understand, by the time the service starts this should be configured as
such already? Or am I mistaken?

So just to be clear, the limits problem I've solved a couple days ago
already, and the server has been restarted quite a few times since then,
with indexing happening and everything in between. It is not the case that
I noticed the problem with the exceptions above, set the limits
accordingly, and only then saw the yellow status. The limits have always
been like this.

Thanks again.

PS. Running CentOS 6.2.

On Wednesday, 4 April 2012 17:43:20 UTC+2, kimchy wrote:

It seems like when you started the process, it did not have the correct
file limit descriptors set, and thats why it was in red start (it tried to
start a shard, got the "too many open files", and failed to do it).

Once you get to yellow status (when you fixed hte file descriptors
problem), things are good, meaning all the primary shards are allocated.
Once its done, replicas are being allocated, some may need to resync with
the primary which might take time (in the future, we possibly will be able
to make this shorter). There is also throttling going on as to not overload
the network / nodes with too many recoveries going on at once.

On Wed, Apr 4, 2012 at 6:37 PM, Thinus Prinsloo wrote:

I am having serious problems with recovering my index after a power
failure. The setup I have is as follows:

  • 1 master-only node
  • 4 data-only node
  • Indexed around 20mill documents, over 8 shards with replication
    factor 2.

After power-failure, I make sure to start the master node first.
Thereafter, I start all the data nodes as quickly as possible. When using
the HEAD plugin I can view the status as RED, and it literally takes hours
to recover. The document count will have to be much higher that 20 million
before I can take this into production.

I have ensured that the document limit (ulimit -n) is 32000 on all
machines, and I can have ElasticSearch verify this for me. When viewing
the logs, I get the following message repeatedly thrown on all of the data
nodes:

[2012-04-04 17:27:22,003][WARN ][indices.cluster ]
[ClusterPC2] [myindex][6] failed to start shard
org.elasticsearch.index.**gateway.IndexShardGatewayRecoveryException:
[index][6] failed recovery
at org.elasticsearch.index.gateway.
IndexShardGatewayService$1.**run(IndexShardGatewayService.**java:228)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: org.elasticsearch.index.engine.
EngineCreationFailureException
: [myindex][6] Failed to open reader
on writer
at org.elasticsearch.index.engine.robin.RobinEngine.
start(RobinEngine.java:279)
at org.elasticsearch.index.shard.**service.InternalIndexShard.
performRecoveryPrepareForTranslog(InternalIndexShard.java:**579)
at org.elasticsearch.index.gateway.local.
LocalIndexShardGateway.**recover(**LocalIndexShardGateway.java:**175)
at org.elasticsearch.index.gateway.
IndexShardGatewayService$1.**run(IndexShardGatewayService.java:177)
... 3 more
Caused by: java.io.FileNotFoundException: /data1/elasticsearch/

elasticsearchcluster/nodes/0/**indices/myindex/6/index/_nyo.**tii
(Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:
233)
at org.apache.lucene.store.SimpleFSDirectory$
SimpleFSIndexInput$Descriptor.
(SimpleFSDirectory.java:**70)
at org.apache.lucene.store.SimpleFSDirectory$
SimpleFSIndexInput.(**SimpleFSDirectory.java:97)
at org.apache.lucene.store.NIOFSDirectory$
NIOFSIndexInput.(**NIOFSDirectory.java:92)
at org.apache.lucene.store.NIOFSDirectory.openInput(
NIOFSDirectory.java:79)
at org.elasticsearch.index.store.Store$StoreDirectory.
openInput(Store.java:458)
at org.apache.lucene.index.TermInfosReader.(
TermInfosReader.java:113)
at org.apache.lucene.index.SegmentCoreReaders.(
SegmentCoreReaders.java:76)
at org.apache.lucene.index.SegmentReader.get(
SegmentReader.java:115)
at org.apache.lucene.index.IndexWriter$ReaderPool.get(
IndexWriter.java:705)
at org.apache.lucene.index.IndexWriter$ReaderPool.
getReadOnlyClone(IndexWriter.**java:663)
at org.apache.lucene.index.DirectoryReader.(
DirectoryReader.java:157)
at org.apache.lucene.index.ReadOnlyDirectoryReader.
(ReadOnlyDirectoryReader.java:**38)
at org.apache.lucene.index.IndexWriter.getReader(
IndexWriter.java:453)
at org.apache.lucene.index.IndexWriter.getReader(
IndexWriter.java:401)
at org.apache.lucene.index.IndexReader.open(IndexReader.
java:345)
at org.elasticsearch.index.engine.robin.RobinEngine.
buildNrtResource(RobinEngine.**java:1365)
at org.elasticsearch.index.engine.robin.RobinEngine.
start(RobinEngine.java:263)
... 6 more

I litterally get thousands of these entries in the logs, several Gigs
worth if I don't stop the process. On the master PC I get this message:

[2012-04-04 17:29:31,613][WARN ][cluster.action.shard ]
[ClusterPC2] received shard failed for [myindex][5],
node[ufvdnrd-QMm0eSEBNowUag], [P], s[INITIALIZING], reason [Failed to start
shard, message [IndexShardGatewayRecoveryException[[myindex][5]
failed recovery]; nested: EngineCreationFailureException**[[myindex][5]
Failed to open reader on writer]; nested: FileNotFoundException[/data2/
**elasticsearch/elasticsearchcluster/nodes/0/
indices/myindex/5/index/_91x.**prx (Too many open files)]; ]]

To solve the problem, I have to restart all the data node services (I
use the service wrapper), so I enter "service elasticsearch restart"
on all my data node PC's. Very soon after this, the HEAD plugin shows
status as yellow, and shards are being initialised and copied. This
process still takes several hours though, and I suspect that's not normal.
Is it possible that through that initial frenzy when the "Files could not
be found" the structure somehow got corrupted? Each shard is about 16GB in
size at the moment, though it will surely grow quite a bit still.

Any advice? Hopefully I'm doing something wrong here...

Thanks, Thinus

--
Thinus Prinsloo
E-mail: thinus.prinsloo@gmail.com
Cell: +27 82 339 2226


(thinusp) #7

The way I solved the problem was not to run ES as a service with the
service-wrapper. For some reason (I know too little about Linux to
understand this) the file limits are not configured correctly before the
service starts. So I launch on startup, after setting the ulimits. Then
it works like a charm.

Hope this helps for you as well.

On Wed, May 23, 2012 at 10:43 AM, Holger Eilhard holger@eilhard.net wrote:

Hi,

have you figured out how to fix this problem? We are also running some ES
boxes with CentOS 6.2 and are running into this issue as well.

Thanks in advance!

Cheers,
Holger

On Wednesday, April 4, 2012 6:10:05 PM UTC+2, Thinus Prinsloo wrote:

Right, so this is what I found:

It seems when the service starts at boot, the limits are not set
correctly. Then when I restart the service, it sets the correct limits.
I'll have to figure out why my limits are not configured at boot time, or
at least by the time the service starts.

Thanks for the help!

Regards,
Thinus

On Wed, Apr 4, 2012 at 5:52 PM, Shay Banon kimchy@gmail.com wrote:

Its simple to test it, use nodes info API to see if it has been applied
or not, it gives you the max file descriptors the process was started with.
I don't think it was applied properly on the first run where you got the
failures.

On Wed, Apr 4, 2012 at 6:50 PM, Thinus Prinsloo <
thinus.prinsloo@gmail.com> wrote:

Thanks Shay,

I realise that this seems to be the case, but as far as I can tell the
limits is not the problem, as I've set them as part of the system
configuration (in /etc/security/limit.conf). So as far as I
understand, by the time the service starts this should be configured as
such already? Or am I mistaken?

So just to be clear, the limits problem I've solved a couple days ago
already, and the server has been restarted quite a few times since then,
with indexing happening and everything in between. It is not the case that
I noticed the problem with the exceptions above, set the limits
accordingly, and only then saw the yellow status. The limits have always
been like this.

Thanks again.

PS. Running CentOS 6.2.

On Wednesday, 4 April 2012 17:43:20 UTC+2, kimchy wrote:

It seems like when you started the process, it did not have the
correct file limit descriptors set, and thats why it was in red start (it
tried to start a shard, got the "too many open files", and failed to do it).

Once you get to yellow status (when you fixed hte file descriptors
problem), things are good, meaning all the primary shards are allocated.
Once its done, replicas are being allocated, some may need to resync with
the primary which might take time (in the future, we possibly will be able
to make this shorter). There is also throttling going on as to not overload
the network / nodes with too many recoveries going on at once.

On Wed, Apr 4, 2012 at 6:37 PM, Thinus Prinsloo wrote:

I am having serious problems with recovering my index after a power
failure. The setup I have is as follows:

  • 1 master-only node
  • 4 data-only node
  • Indexed around 20mill documents, over 8 shards with replication
    factor 2.

After power-failure, I make sure to start the master node first.
Thereafter, I start all the data nodes as quickly as possible. When using
the HEAD plugin I can view the status as RED, and it literally takes hours
to recover. The document count will have to be much higher that 20 million
before I can take this into production.

I have ensured that the document limit (ulimit -n) is 32000 on all
machines, and I can have ElasticSearch verify this for me. When viewing
the logs, I get the following message repeatedly thrown on all of the data
nodes:

[2012-04-04 17:27:22,003][WARN ][indices.cluster ]
[ClusterPC2] [myindex][6] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryExc
eption: [index][6] failed recovery
at org.elasticsearch.index.gateway.

IndexShardGatewayService$1.run(IndexShardGatewayService.ja
va:228)
at java.util.concurrent.ThreadPoolExecutor.runWorker(**
ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(

ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: org.elasticsearch.index.engine.

EngineCreationFailureException: [myindex][6] Failed to open
reader on writer
at org.elasticsearch.index.engine.robin.RobinEngine.**
start(RobinEngine.java:279)
at org.elasticsearch.index.shard.
**
service.InternalIndexShard.performRecoveryPrepareForTranslog
(InternalIndexShard.java:579)
at org.elasticsearch.index.gateway.local.

LocalIndexShardGateway**.recover(LocalIndexShardGateway.java:
175)
at org.elasticsearch.index.gateway.**
IndexShardGatewayService$1.run(IndexShardGatewayService.ja
va:177)
... 3 more
Caused by: java.io.FileNotFoundException: /data1/elasticsearch/**
elasticsearchcluster/nodes/0/indices/myindex/6/index/_nyo.tii
(Too many open files)
at java.io.RandomAccessFile.open(**Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:
233)
at org.apache.lucene.store.SimpleFSDirectory$

SimpleFSIndexInput
$Descriptor.
(SimpleFSDirectory.java:
70)
at org.apache.lucene.store.SimpleFSDirectory$

SimpleFSIndexInput
.(SimpleFSDirectory.java:97)
at org.apache.lucene.store.NIOFSDirectory$

NIOFSIndexInput.<init>(NIOFSDirectory.java:92)
at org.apache.lucene.store.NIOFSDirectory.openInput(

NIOFSDirect
ory.java:79)
at org.elasticsearch.index.store.Store$StoreDirectory.
openInput
(Store.java:458)
at org.apache.lucene.index.TermInfosReader.(

TermInfosRead
er.java:113)
at org.apache.lucene.index.SegmentCoreReaders.(**
SegmentCoreReaders.java:76)
at org.apache.lucene.index.SegmentReader.get(

SegmentReader.java:115)
at org.apache.lucene.index.IndexWriter$ReaderPool.get(

IndexWriter.java:705)
at org.apache.lucene.index.IndexWriter$ReaderPool.

getReadOnlyClone(IndexWriter.java:663)
at org.apache.lucene.index.DirectoryReader.(

DirectoryRead
er.java:157)
at org.apache.lucene.index.ReadOnlyDirectoryReader.
*(ReadOnlyDirectoryReader.java:38)
at org.apache.lucene.index.IndexWriter.getReader(

IndexWriter.java:453)
at org.apache.lucene.index.IndexWriter.getReader(

IndexWriter.java:401)
at org.apache.lucene.index.IndexReader.open(IndexReader.

*java:345)
at org.elasticsearch.index.engine.robin.RobinEngine.

buildNrtRes
ource(RobinEngine.java:1365)
at org.elasticsearch.index.engine.robin.RobinEngine.

start(**RobinEngine.java:263)
... 6 more

I litterally get thousands of these entries in the logs, several Gigs
worth if I don't stop the process. On the master PC I get this message:

[2012-04-04 17:29:31,613][WARN ][cluster.action.shard ]
[ClusterPC2] received shard failed for [myindex][5],
node[ufvdnrd-QMm0eSEBNowUag], [P], s[INITIALIZING], reason [Failed to start
shard, message [IndexShardGatewayRecoveryException[[myindex][5]
failed recovery]; nested: EngineCreationFailureException****[[myindex][5]
Failed to open reader on writer]; nested: FileNotFoundException[/data2/
elasticsearch/elasticsearchcluster/nodes/0/

indices/myindex/5/**index/_91x.**prx (Too many open files)]; ]]

To solve the problem, I have to restart all the data node services (I
use the service wrapper), so I enter "service elasticsearch restart"
on all my data node PC's. Very soon after this, the HEAD plugin shows
status as yellow, and shards are being initialised and copied. This
process still takes several hours though, and I suspect that's not normal.
Is it possible that through that initial frenzy when the "Files could not
be found" the structure somehow got corrupted? Each shard is about 16GB in
size at the moment, though it will surely grow quite a bit still.

Any advice? Hopefully I'm doing something wrong here...

Thanks, Thinus

--
Thinus Prinsloo
E-mail: thinus.prinsloo@gmail.**com thinus.prinsloo@gmail.com
Cell: +27 82 339 2226

--
Thinus Prinsloo
E-mail: thinus.prinsloo@gmail.com
Cell: +27 82 339 2226


(system) #8