we're using elasticsearch 0.16.3. While reindexing our data (currently
around 200k small documents), we're having the 'Too many open files'
problem.
We're issuing a large number of single index requests in parallel
(around 150-200 per second) using resque. It works fine for a short
time, then we start seeing Connection resets and broken pipes, all
caused by Too many open files.
The problem doesn't occur when we're reducing the indexing workers.
Is this a problem of elasticsearch or of netty? Is it possible to run
elasticsearch in tomcat and could this help with this issue?
Can we tweak elasticsearch options to increase stability? We tried
reducing the flush interval and decrease the lucene merge factor but
this didn't help.
Please let us know if we can provide more information about our setup
that might be helpful in diagnosing the issue.
Thanks,
Andreas
Stacktrace example of a too many open files error:
[2011-07-19 12:45:12,408][WARN ][index.shard.service ] [Boneyard]
[nodes][2] Failed to perform scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException: [nodes]
[2] Refresh failed
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:
644)
at
org.elasticsearch.index.shard.service.InternalIndexShard.refresh(InternalIndexShard.java:
403)
at org.elasticsearch.index.shard.service.InternalIndexShard
$EngineRefresher$1.run(InternalIndexShard.java:628)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.FileNotFoundException: /home/moviepilot/data/
elasticsearch/sheldon-index/nodes/0/indices/nodes/2/index/_2dg.fdx
(Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput
$Descriptor.(SimpleFSDirectory.java:69)
at org.apache.lucene.store.SimpleFSDirectory
$SimpleFSIndexInput.(SimpleFSDirectory.java:90)
at org.apache.lucene.store.NIOFSDirectory
$NIOFSIndexInput.(NIOFSDirectory.java:91)
at
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:
78)
at org.elasticsearch.index.store.support.AbstractStore
$StoreDirectory.openInput(AbstractStore.java:344)
at
org.apache.lucene.index.FieldsReader.(FieldsReader.java:129)
at org.apache.lucene.index.SegmentReader
$CoreReaders.openDocStores(SegmentReader.java:290)
at
org.apache.lucene.index.SegmentReader.openDocStores(SegmentReader.java:
600)
at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:693)
at org.apache.lucene.index.IndexWriter
$ReaderPool.getReadOnlyClone(IndexWriter.java:642)
at
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:
155)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:
38)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:455)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:403)
at
org.apache.lucene.index.DirectoryReader.doReopenFromWriter(DirectoryReader.java:
405)
at
org.apache.lucene.index.DirectoryReader.doReopen(DirectoryReader.java:
418)
at
org.apache.lucene.index.DirectoryReader.reopen(DirectoryReader.java:
383)
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:
627)
we're using elasticsearch 0.16.3. While reindexing our data (currently
around 200k small documents), we're having the 'Too many open files'
problem.
We're issuing a large number of single index requests in parallel
(around 150-200 per second) using resque. It works fine for a short
time, then we start seeing Connection resets and broken pipes, all
caused by Too many open files.
The problem doesn't occur when we're reducing the indexing workers.
Is this a problem of elasticsearch or of netty? Is it possible to run
elasticsearch in tomcat and could this help with this issue?
Can we tweak elasticsearch options to increase stability? We tried
reducing the flush interval and decrease the lucene merge factor but
this didn't help.
Please let us know if we can provide more information about our setup
that might be helpful in diagnosing the issue.
Thanks,
Andreas
Stacktrace example of a too many open files error:
[2011-07-19 12:45:12,408][WARN ][index.shard.service ] [Boneyard]
[nodes][2] Failed to perform scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException: [nodes]
[2] Refresh failed
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:
644)
at
org.elasticsearch.index.shard.service.InternalIndexShard.refresh(InternalIndexShard.java:
403)
at org.elasticsearch.index.shard.service.InternalIndexShard
$EngineRefresher$1.run(InternalIndexShard.java:628)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.FileNotFoundException: /home/moviepilot/data/
elasticsearch/sheldon-index/nodes/0/indices/nodes/2/index/_2dg.fdx
(Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput
$Descriptor.(SimpleFSDirectory.java:69)
at org.apache.lucene.store.SimpleFSDirectory
$SimpleFSIndexInput.(SimpleFSDirectory.java:90)
at org.apache.lucene.store.NIOFSDirectory
$NIOFSIndexInput.(NIOFSDirectory.java:91)
at
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:
78)
at org.elasticsearch.index.store.support.AbstractStore
$StoreDirectory.openInput(AbstractStore.java:344)
at
org.apache.lucene.index.FieldsReader.(FieldsReader.java:129)
at org.apache.lucene.index.SegmentReader
$CoreReaders.openDocStores(SegmentReader.java:290)
at
org.apache.lucene.index.SegmentReader.openDocStores(SegmentReader.java:
600)
at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:693)
at org.apache.lucene.index.IndexWriter
$ReaderPool.getReadOnlyClone(IndexWriter.java:642)
at
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:
155)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:
38)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:455)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:403)
at
org.apache.lucene.index.DirectoryReader.doReopenFromWriter(DirectoryReader.java:
405)
at
org.apache.lucene.index.DirectoryReader.doReopen(DirectoryReader.java:
418)
at
org.apache.lucene.index.DirectoryReader.reopen(DirectoryReader.java:
383)
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:
627)
The errors started with lsof | wc -l being around 4000 and lsof | grep
elasticsearch | wc -l around 900. I'm not sure about netstat -a | wc
-l, but I'm pretty sure it was below 1000.
I'll run more tests, watching netstat and trying to tweak settings in
/proc/sys/net.
we're using elasticsearch 0.16.3. While reindexing our data (currently
around 200k small documents), we're having the 'Too many open files'
problem.
We're issuing a large number of single index requests in parallel
(around 150-200 per second) using resque. It works fine for a short
time, then we start seeing Connection resets and broken pipes, all
caused by Too many open files.
The problem doesn't occur when we're reducing the indexing workers.
Is this a problem of elasticsearch or of netty? Is it possible to run
elasticsearch in tomcat and could this help with this issue?
Can we tweak elasticsearch options to increase stability? We tried
reducing the flush interval and decrease the lucene merge factor but
this didn't help.
Please let us know if we can provide more information about our setup
that might be helpful in diagnosing the issue.
Thanks,
Andreas
Stacktrace example of a too many open files error:
[2011-07-19 12:45:12,408][WARN ][index.shard.service ] [Boneyard]
[nodes][2] Failed to perform scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException: [nodes]
[2] Refresh failed
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:
644)
at
org.elasticsearch.index.shard.service.InternalIndexShard.refresh(InternalIndexShard.java:
403)
at org.elasticsearch.index.shard.service.InternalIndexShard
$EngineRefresher$1.run(InternalIndexShard.java:628)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.FileNotFoundException: /home/moviepilot/data/
elasticsearch/sheldon-index/nodes/0/indices/nodes/2/index/_2dg.fdx
(Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput
$Descriptor.(SimpleFSDirectory.java:69)
at org.apache.lucene.store.SimpleFSDirectory
$SimpleFSIndexInput.(SimpleFSDirectory.java:90)
at org.apache.lucene.store.NIOFSDirectory
$NIOFSIndexInput.(NIOFSDirectory.java:91)
at
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:
78)
at org.elasticsearch.index.store.support.AbstractStore
$StoreDirectory.openInput(AbstractStore.java:344)
at
org.apache.lucene.index.FieldsReader.(FieldsReader.java:129)
at org.apache.lucene.index.SegmentReader
$CoreReaders.openDocStores(SegmentReader.java:290)
at
org.apache.lucene.index.SegmentReader.openDocStores(SegmentReader.java:
600)
at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:693)
at org.apache.lucene.index.IndexWriter
$ReaderPool.getReadOnlyClone(IndexWriter.java:642)
at
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:
155)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:
38)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:455)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:403)
at
org.apache.lucene.index.DirectoryReader.doReopenFromWriter(DirectoryReader.java:
405)
at
org.apache.lucene.index.DirectoryReader.doReopen(DirectoryReader.java:
418)
at
org.apache.lucene.index.DirectoryReader.reopen(DirectoryReader.java:
383)
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:
627)
--
Andreas Bauer
Moviepilot GmbH | Mehringdamm 33 | 10961 Berlin | Germany | Tel: +49
30 616 512-0
Sitz der Gesellschaft: Berlin, Deutschland | Handelsregister:
Amtsgericht Berlin-Charlottenburg, HRB Nr. 107195 B | Geschäftsführer:
Tobias Bauckhage, Malte Cherdron
We keep having a very similar issue whenever I pull up a new instance
on AWS it bails with "too many open files" right away.
So far, I always checked with lsof and netstat etc., but saw nothing
out of the ordinary. After checking various things for 10-15 minutes,
I start elasticsearch again and then it continues to work. I'm not
sure if I just hit some sort of random capacity notch with AWS, or how
these things are related.
Anyway, one thing I noticed was that the suggested 32000 in the
service-wrapper don't work for us at all. We're currently writing with
a single thread to the index. The default (for the root user) of 65xxx
works much better for us.
I guess you're on AWS also, right? Are you guys on karmic still as
well, or are you using scalarium's lucid image?
The errors started with lsof | wc -l being around 4000 and lsof | grep
elasticsearch | wc -l around 900. I'm not sure about netstat -a | wc
-l, but I'm pretty sure it was below 1000.
I'll run more tests, watching netstat and trying to tweak settings in
/proc/sys/net.
we're using elasticsearch 0.16.3. While reindexing our data (currently
around 200k small documents), we're having the 'Too many open files'
problem.
We're issuing a large number of single index requests in parallel
(around 150-200 per second) using resque. It works fine for a short
time, then we start seeing Connection resets and broken pipes, all
caused by Too many open files.
The problem doesn't occur when we're reducing the indexing workers.
Is this a problem of elasticsearch or of netty? Is it possible to run
elasticsearch in tomcat and could this help with this issue?
Can we tweak elasticsearch options to increase stability? We tried
reducing the flush interval and decrease the lucene merge factor but
this didn't help.
Please let us know if we can provide more information about our setup
that might be helpful in diagnosing the issue.
Thanks,
Andreas
Stacktrace example of a too many open files error:
[2011-07-19 12:45:12,408][WARN ][index.shard.service ] [Boneyard]
[nodes][2] Failed to perform scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException: [nodes]
[2] Refresh failed
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:
644)
at
org.elasticsearch.index.shard.service.InternalIndexShard.refresh(InternalIn dexShard.java:
403)
at org.elasticsearch.index.shard.service.InternalIndexShard
$EngineRefresher$1.run(InternalIndexShard.java:628)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.FileNotFoundException: /home/moviepilot/data/
elasticsearch/sheldon-index/nodes/0/indices/nodes/2/index/_2dg.fdx
(Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput
$Descriptor.(SimpleFSDirectory.java:69)
at org.apache.lucene.store.SimpleFSDirectory
$SimpleFSIndexInput.(SimpleFSDirectory.java:90)
at org.apache.lucene.store.NIOFSDirectory
$NIOFSIndexInput.(NIOFSDirectory.java:91)
at
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:
78)
at org.elasticsearch.index.store.support.AbstractStore
$StoreDirectory.openInput(AbstractStore.java:344)
at
org.apache.lucene.index.FieldsReader.(FieldsReader.java:129)
at org.apache.lucene.index.SegmentReader
$CoreReaders.openDocStores(SegmentReader.java:290)
at
org.apache.lucene.index.SegmentReader.openDocStores(SegmentReader.java:
600)
at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:693)
at org.apache.lucene.index.IndexWriter
$ReaderPool.getReadOnlyClone(IndexWriter.java:642)
at
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:
155)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryRea der.java:
38)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:455)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:403)
at
org.apache.lucene.index.DirectoryReader.doReopenFromWriter(DirectoryReader. java:
405)
at
org.apache.lucene.index.DirectoryReader.doReopen(DirectoryReader.java:
418)
at
org.apache.lucene.index.DirectoryReader.reopen(DirectoryReader.java:
383)
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:
627)
--
Andreas Bauer
Moviepilot GmbH | Mehringdamm 33 | 10961 Berlin | Germany | Tel: +49
30 616 512-0
Sitz der Gesellschaft: Berlin, Deutschland | Handelsregister:
Amtsgericht Berlin-Charlottenburg, HRB Nr. 107195 B | Geschäftsführer:
Tobias Bauckhage, Malte Cherdron
The number of open files depends on many factors, size of the cluster
(sockets), number of clients connected (sockets), number of shards allocated
on the node (index files), so its hard to tell where its coming from...
In 0.17, there is the maximum open files limit in the nodes info, and
current open files in nodes stats, hope it will help us see whats going
on...
We keep having a very similar issue whenever I pull up a new instance
on AWS it bails with "too many open files" right away.
So far, I always checked with lsof and netstat etc., but saw nothing
out of the ordinary. After checking various things for 10-15 minutes,
I start elasticsearch again and then it continues to work. I'm not
sure if I just hit some sort of random capacity notch with AWS, or how
these things are related.
Anyway, one thing I noticed was that the suggested 32000 in the
service-wrapper don't work for us at all. We're currently writing with
a single thread to the index. The default (for the root user) of 65xxx
works much better for us.
I guess you're on AWS also, right? Are you guys on karmic still as
well, or are you using scalarium's lucid image?
The errors started with lsof | wc -l being around 4000 and lsof | grep
elasticsearch | wc -l around 900. I'm not sure about netstat -a | wc
-l, but I'm pretty sure it was below 1000.
I'll run more tests, watching netstat and trying to tweak settings in
/proc/sys/net.
we're using elasticsearch 0.16.3. While reindexing our data (currently
around 200k small documents), we're having the 'Too many open files'
problem.
We're issuing a large number of single index requests in parallel
(around 150-200 per second) using resque. It works fine for a short
time, then we start seeing Connection resets and broken pipes, all
caused by Too many open files.
The problem doesn't occur when we're reducing the indexing workers.
Is this a problem of elasticsearch or of netty? Is it possible to run
elasticsearch in tomcat and could this help with this issue?
Can we tweak elasticsearch options to increase stability? We tried
reducing the flush interval and decrease the lucene merge factor but
this didn't help.
Please let us know if we can provide more information about our setup
that might be helpful in diagnosing the issue.
Thanks,
Andreas
Stacktrace example of a too many open files error:
[2011-07-19 12:45:12,408][WARN ][index.shard.service ] [Boneyard]
[nodes][2] Failed to perform scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException: [nodes]
[2] Refresh failed
at
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.FileNotFoundException: /home/moviepilot/data/
elasticsearch/sheldon-index/nodes/0/indices/nodes/2/index/_2dg.fdx
(Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput
$Descriptor.(SimpleFSDirectory.java:69)
at org.apache.lucene.store.SimpleFSDirectory
$SimpleFSIndexInput.(SimpleFSDirectory.java:90)
at org.apache.lucene.store.NIOFSDirectory
$NIOFSIndexInput.(NIOFSDirectory.java:91)
at
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:
at org.elasticsearch.index.store.support.AbstractStore
$StoreDirectory.openInput(AbstractStore.java:344)
at
org.apache.lucene.index.FieldsReader.(FieldsReader.java:129)
at org.apache.lucene.index.SegmentReader
$CoreReaders.openDocStores(SegmentReader.java:290)
at
at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:693)
at org.apache.lucene.index.IndexWriter
$ReaderPool.getReadOnlyClone(IndexWriter.java:642)
at
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.