Elasticsearch on debian squeeze problem (too many open files)

Hello,

I am running elasticsearch 0.19.11 on Debian Squeeze installed via apt-get.

I set the nofile settings in /etc/security/limits.conf to 65535 for user
elasticsearch and I ran elasticsearch successfully for a few days (in
combination with logstash and kibana).

Afterwards I noticed that elasticsearch had started failing with the
following messages in the logs:

[2012-12-12 18:33:31,239][WARN ][indices.cluster ] [Ganymede]
[logstash-2012.12.27][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[logstash-2012.12.27][3] shard allocated for local recovery (post api),
should exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:122)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2012-12-12 18:33:31,767][WARN ][cluster.action.shard ] [Ganymede]
sending failed shard for [logstash-2012.12.27][3],
node[8_tqdqxMTB6an-abAdSTdA], [P], s[INITIALIZING], reason [Failed to start
shard, message [IndexShardGatewayRecoveryException[[logstash-2012.12.27][3]
shard allocated for local recovery (post api), should exists, but doesn't]]]
[2012-12-12 18:33:31,767][WARN ][cluster.action.shard ] [Ganymede]
received shard failed for [logstash-2012.12.27][3],
node[8_tqdqxMTB6an-abAdSTdA], [P], s[INITIALIZING], reason [Failed to start
shard, message [IndexShardGatewayRecoveryException[[logstash-2012.12.27][3]
shard allocated for local recovery (post api), should exists, but doesn't]]]
[2012-12-12 18:33:32,132][WARN ][indices.cluster ] [Ganymede]
[logstash-2012.12.27][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[logstash-2012.12.27][3] shard allocated for local recovery (post api),
should exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:122)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

Followed by:

[2012-12-12 18:33:31,239][WARN ][indices.cluster ] [Ganymede]
[logstash-2012.12.27][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[logstash-2012.12.27][3] shard allocated for local recovery (post api),
should exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:122)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2012-12-12 18:33:31,767][WARN ][cluster.action.shard ] [Ganymede]
sending failed shard for [logstash-2012.12.27][3],
node[8_tqdqxMTB6an-abAdSTdA], [P], s[INITIALIZING], reason [Failed to start
shard, message [IndexShardGatewayRecoveryException[[logstash-2012.12.27][3]
shard allocated for local recovery (post api), should exists, but doesn't]]]
[2012-12-12 18:33:31,767][WARN ][cluster.action.shard ] [Ganymede]
received shard failed for [logstash-2012.12.27][3],
node[8_tqdqxMTB6an-abAdSTdA], [P], s[INITIALIZING], reason [Failed to start
shard, message [IndexShardGatewayRecoveryException[[logstash-2012.12.27][3]
shard allocated for local recovery (post api), should exists, but doesn't]]]
[2012-12-12 18:33:32,132][WARN ][indices.cluster ] [Ganymede]
[logstash-2012.12.27][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[logstash-2012.12.27][3] shard allocated for local recovery (post api),
should exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:122)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

Any advice or ideas on what is going on?

Thanks!
OD

--

Ooops, I meant to say, followed by:

org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[logstash-2012.12.08][4] failed recovery
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:228)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException:
[logstash-2012.12.08][4] failed to open reader on writer
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:287)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryPrepareForTranslog(InternalIndexShard.java:547)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:190)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
... 3 more
Caused by: java.io.FileNotFoundException:
/var/data/elasticsearch/elasticsearch/nodes/0/indices/logstash-2012.12.08/4/index/_2i.fdt
(Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:71)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:98)
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:92)
at
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:79)
at
org.elasticsearch.index.store.Store$StoreDirectory.openInput(Store.java:537)
at
org.apache.lucene.index.FieldsReader.(FieldsReader.java:131)
at
org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:234)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:118)
at
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:696)
at
org.apache.lucene.index.IndexWriter$ReaderPool.getReadOnlyClone(IndexWriter.java:654)
at
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:142)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:36)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:451)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:399)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:296)
at
org.apache.lucene.search.SearcherManager.(SearcherManager.java:82)
at
org.elasticsearch.index.engine.robin.RobinEngine.buildSearchManager(RobinEngine.java:1428)
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:271)
... 6 more
[2012-12-12 20:49:50,906][WARN ][cluster.action.shard ] [Daisy Johnson]
sending failed shard for [logstash-2012.12.08][4],
node[NfCyiZMxTo-29Ibp-yjAhw], [P], s[INITIALIZING], reason [Failed to start
shard, message [IndexShardGatewayRecoveryException[[logstash-2012.12.08][4]
failed recovery]; nested:
EngineCreationFailureException[[logstash-2012.12.08][4] failed to open
reader on writer]; nested:
FileNotFoundException[/var/data/elasticsearch/elasticsearch/nodes/0/indices/logstash-2012.12.08/4/index/_2i.fdt
(Too many open files)]; ]]
[2012-12-12 20:49:50,906][WARN ][cluster.action.shard ] [Daisy Johnson]
received shard failed for [logstash-2012.12.08][4],
node[NfCyiZMxTo-29Ibp-yjAhw], [P], s[INITIALIZING], reason [Failed to start
shard, message [IndexShardGatewayRecoveryException[[logstash-2012.12.08][4]
failed recovery]; nested:
EngineCreationFailureException[[logstash-2012.12.08][4] failed to open
reader on writer]; nested:
FileNotFoundException[/var/data/elasticsearch/elasticsearch/nodes/0/indices/logstash-2012.12.08/4/index/_2i.fdt
(Too many open files)]; ]]

Thanks!
OD

On Wednesday, December 12, 2012 8:37:28 PM UTC-6, Ognen Duzlevski wrote:

Hello,

I am running elasticsearch 0.19.11 on Debian Squeeze installed via apt-get.

I set the nofile settings in /etc/security/limits.conf to 65535 for user
elasticsearch and I ran elasticsearch successfully for a few days (in
combination with logstash and kibana).

Afterwards I noticed that elasticsearch had started failing with the
following messages in the logs:

[2012-12-12 18:33:31,239][WARN ][indices.cluster ] [Ganymede]
[logstash-2012.12.27][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[logstash-2012.12.27][3] shard allocated for local recovery (post api),
should exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:122)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2012-12-12 18:33:31,767][WARN ][cluster.action.shard ] [Ganymede]
sending failed shard for [logstash-2012.12.27][3],
node[8_tqdqxMTB6an-abAdSTdA], [P], s[INITIALIZING], reason [Failed to start
shard, message [IndexShardGatewayRecoveryException[[logstash-2012.12.27][3]
shard allocated for local recovery (post api), should exists, but doesn't]]]
[2012-12-12 18:33:31,767][WARN ][cluster.action.shard ] [Ganymede]
received shard failed for [logstash-2012.12.27][3],
node[8_tqdqxMTB6an-abAdSTdA], [P], s[INITIALIZING], reason [Failed to start
shard, message [IndexShardGatewayRecoveryException[[logstash-2012.12.27][3]
shard allocated for local recovery (post api), should exists, but doesn't]]]
[2012-12-12 18:33:32,132][WARN ][indices.cluster ] [Ganymede]
[logstash-2012.12.27][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[logstash-2012.12.27][3] shard allocated for local recovery (post api),
should exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:122)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

Followed by:

[2012-12-12 18:33:31,239][WARN ][indices.cluster ] [Ganymede]
[logstash-2012.12.27][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[logstash-2012.12.27][3] shard allocated for local recovery (post api),
should exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:122)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2012-12-12 18:33:31,767][WARN ][cluster.action.shard ] [Ganymede]
sending failed shard for [logstash-2012.12.27][3],
node[8_tqdqxMTB6an-abAdSTdA], [P], s[INITIALIZING], reason [Failed to start
shard, message [IndexShardGatewayRecoveryException[[logstash-2012.12.27][3]
shard allocated for local recovery (post api), should exists, but doesn't]]]
[2012-12-12 18:33:31,767][WARN ][cluster.action.shard ] [Ganymede]
received shard failed for [logstash-2012.12.27][3],
node[8_tqdqxMTB6an-abAdSTdA], [P], s[INITIALIZING], reason [Failed to start
shard, message [IndexShardGatewayRecoveryException[[logstash-2012.12.27][3]
shard allocated for local recovery (post api), should exists, but doesn't]]]
[2012-12-12 18:33:32,132][WARN ][indices.cluster ] [Ganymede]
[logstash-2012.12.27][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[logstash-2012.12.27][3] shard allocated for local recovery (post api),
should exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:122)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

Any advice or ideas on what is going on?

Thanks!
OD

--

On Thursday, December 13, 2012 4:37:28 AM UTC+2, Ognen Duzlevski wrote:

Hello,

I am running elasticsearch 0.19.11 on Debian Squeeze installed via apt-get.

I set the nofile settings in /etc/security/limits.conf to 65535 for user
elasticsearch and I ran elasticsearch successfully for a few days (in
combination with logstash and kibana).

Afterwards I noticed that elasticsearch had started failing with the
following messages in the logs:

[2012-12-12 18:33:31,239][WARN ][indices.cluster ] [Ganymede]
[logstash-2012.12.27][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[logstash-2012.12.27][3] shard allocated for local recovery (post api),
should exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:122)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2012-12-12 18:33:31,767][WARN ][cluster.action.shard ] [Ganymede]
sending failed shard for [logstash-2012.12.27][3],
node[8_tqdqxMTB6an-abAdSTdA], [P], s[INITIALIZING], reason [Failed to start
shard, message [IndexShardGatewayRecoveryException[[logstash-2012.12.27][3]
shard allocated for local recovery (post api), should exists, but doesn't]]]
[2012-12-12 18:33:31,767][WARN ][cluster.action.shard ] [Ganymede]
received shard failed for [logstash-2012.12.27][3],
node[8_tqdqxMTB6an-abAdSTdA], [P], s[INITIALIZING], reason [Failed to start
shard, message [IndexShardGatewayRecoveryException[[logstash-2012.12.27][3]
shard allocated for local recovery (post api), should exists, but doesn't]]]
[2012-12-12 18:33:32,132][WARN ][indices.cluster ] [Ganymede]
[logstash-2012.12.27][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[logstash-2012.12.27][3] shard allocated for local recovery (post api),
should exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:122)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

Followed by:

[2012-12-12 18:33:31,239][WARN ][indices.cluster ] [Ganymede]
[logstash-2012.12.27][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[logstash-2012.12.27][3] shard allocated for local recovery (post api),
should exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:122)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2012-12-12 18:33:31,767][WARN ][cluster.action.shard ] [Ganymede]
sending failed shard for [logstash-2012.12.27][3],
node[8_tqdqxMTB6an-abAdSTdA], [P], s[INITIALIZING], reason [Failed to start
shard, message [IndexShardGatewayRecoveryException[[logstash-2012.12.27][3]
shard allocated for local recovery (post api), should exists, but doesn't]]]
[2012-12-12 18:33:31,767][WARN ][cluster.action.shard ] [Ganymede]
received shard failed for [logstash-2012.12.27][3],
node[8_tqdqxMTB6an-abAdSTdA], [P], s[INITIALIZING], reason [Failed to start
shard, message [IndexShardGatewayRecoveryException[[logstash-2012.12.27][3]
shard allocated for local recovery (post api), should exists, but doesn't]]]
[2012-12-12 18:33:32,132][WARN ][indices.cluster ] [Ganymede]
[logstash-2012.12.27][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[logstash-2012.12.27][3] shard allocated for local recovery (post api),
should exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:122)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

Any advice or ideas on what is going on?

Thanks!
OD

--