Too many files open


(thinusp) #1

Could someone perhaps help me troubleshoot this problem... Over the weekend
I ran a test on my 5-node cluster, which started failing after a while.
After looking at the logs (which came up to be around 9 GB of data...) I
found this error being thrown repeatedly:

[2012-03-26 14:00:15,270][WARN
][netty.channel.socket.nio.NioServerSocketPipelineSink] Failed to accept a
connection.
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)
at
org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.run(NioServerSocketPipelineSink.java:236)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

It seems as if there's some problem with "Too many open files", but this
cannot be the case, or at least not the way I understand it. I set the
user running the ElasticSearch instance's file limit to 32000 as per the
suggestion on the website, but even if I check with "lsof | wc -l" there is
in total only about 6000 descriptors open, so I doubt it's really the
problem. What bugs me, but I really do not know much about these things,
is that it seems to be a "NioServerSocketPipelineSink" connection, for
which I'm not entirely sure how it relates to disc-IO.

Another seemingly related issue is the following error which is also thrown
into the mix:

org.elasticsearch.index.engine.IndexFailedEngineException: [dev-index][1]
Index failed for [info#< omitted >]
at
org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:482)
at
org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:323)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:158)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:529)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:427)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.io.FileNotFoundException:
/data2/esc/cluster0/nodes/0/indices/dev-index/1/index/_j3.fdx (Too many
open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at
org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:441)
at
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:306)
at
org.elasticsearch.index.store.Store$StoreDirectory.createOutput(Store.java:418)
at
org.elasticsearch.index.store.Store$StoreDirectory.createOutput(Store.java:390)
at org.apache.lucene.index.FieldsWriter.(FieldsWriter.java:84)
at
org.apache.lucene.index.StoredFieldsWriter.initFieldsWriter(StoredFieldsWriter.java:65)
at
org.apache.lucene.index.StoredFieldsWriter.finishDocument(StoredFieldsWriter.java:108)
at
org.apache.lucene.index.StoredFieldsWriter$PerDoc.finish(StoredFieldsWriter.java:152)
at
org.apache.lucene.index.DocumentsWriter$WaitQueue.writeDocument(DocumentsWriter.java:1404)
at
org.apache.lucene.index.DocumentsWriter$WaitQueue.add(DocumentsWriter.java:1424)
at
org.apache.lucene.index.DocumentsWriter.finishDocument(DocumentsWriter.java:1043)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2066)
at
org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:565)
at
org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:477)
... 7 more

Again there's that "Too many open files" message. Any idea as to what
might be causing this problem? Am I using the Java API wrong? Thanks for
the help - I appreciate it.

  • Thinus

(Thomas Peuss) #2

Hi Thinus!

Am Montag, 26. März 2012 14:11:08 UTC+2 schrieb Thinus Prinsloo:

java.io.IOException: Too many open files
at sun.nio.ch.​ServerSocketChannelImpl.​accept0(Native Method)
at
sun.nio.ch.​ServerSocketChannelImpl.​accept(​ServerSocketChannelImpl.java:​163)
at
org.elasticsearch.common.​netty.channel.socket.nio.​NioServerSocketPipelineSink$​Boss.run(​NioServerSocketPipelineSink.​java:236)
at
org.elasticsearch.common.​netty.util.​ThreadRenamingRunnable.run(​ThreadRenamingRunnable.java:​102)
at
org.elasticsearch.common.​netty.util.internal.​DeadLockProofWorker$1.run(​DeadLockProofWorker.java:42)
at
java.util.concurrent.​ThreadPoolExecutor.runWorker(​ThreadPoolExecutor.java:1110)
at
java.util.concurrent.​ThreadPoolExecutor$Worker.run(​ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.​java:679)

You need to raise the "ulimit" for the user that is running ES. Depends on
your Linux distribution where you have to do it. On RedHat you can add a
file to /etc/security/limits.d.

Ours looks like this:
elastic - memlock unlimited
elastic soft nofile 80000
elastic hard nofile 100000

Our user is called "elastic". The first line allows the ES JVM to lock as
much memory as it wants to (memory mapped files count as memory as well!).
When the user reaches the soft limit a warning is written to the log.

CU
Thomas


(Shay Banon) #3

Also, check nodes info and nodes stats API, they provide information on the
max open files desc limit, and hte current open files. Verify through the
nodes info API that your setting has actually taken affect.

On Mon, Mar 26, 2012 at 4:10 PM, Thomas Peuss thomas.peuss@nterra.comwrote:

Hi Thinus!

Am Montag, 26. März 2012 14:11:08 UTC+2 schrieb Thinus Prinsloo:

java.io.IOException: Too many open files
at sun.nio.ch.​ServerSocketChannelImpl.​accept0(Native Method)
at sun.nio.ch
.​ServerSocketChannelImpl.​accept(​ServerSocketChannelImpl.java:​163)
at
org.elasticsearch.common.​netty.channel.socket.nio.​NioServerSocketPipelineSink$​Boss.run(​NioServerSocketPipelineSink.​java:236)
at
org.elasticsearch.common.​netty.util.​ThreadRenamingRunnable.run(​ThreadRenamingRunnable.java:​102)
at
org.elasticsearch.common.​netty.util.internal.​DeadLockProofWorker$1.run(​DeadLockProofWorker.java:42)
at
java.util.concurrent.​ThreadPoolExecutor.runWorker(​ThreadPoolExecutor.java:1110)
at
java.util.concurrent.​ThreadPoolExecutor$Worker.run(​ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.​java:679)

You need to raise the "ulimit" for the user that is running ES. Depends on
your Linux distribution where you have to do it. On RedHat you can add a
file to /etc/security/limits.d.

Ours looks like this:
elastic - memlock unlimited
elastic soft nofile 80000
elastic hard nofile 100000

Our user is called "elastic". The first line allows the ES JVM to lock as
much memory as it wants to (memory mapped files count as memory as well!).
When the user reaches the soft limit a warning is written to the log.

CU
Thomas


(thinusp) #4

Thanks - that's basically what I did. I realised that the configuration I
used to set the limits did not work, so I added it to the ElasticSearch
launch script. I realised that when I was trying to verify through the API
as you suggested. What made it very difficult then was that even though it
reported the right limit, I still got the error and when viewing the
associated file descriptors for that thread the number was not near the
limit. All I could gather here was that because of the previous failure of
running out of file-id's, the index was now severely corrupted (not sure if
it can happen like that). So this time, though it showed the same error,
it could actually not find the file. So I started deleting some shards and
had ES recover until it sorted my problem. It was a bit messy though...

Bottom-line, make sure that setting is working properly by using the API,
and don't ever allow it to try and index millions of files when that error
occurs. Rather stop... :slight_smile:

On Tue, Mar 27, 2012 at 2:56 PM, Shay Banon kimchy@gmail.com wrote:

Also, check nodes info and nodes stats API, they provide information on
the max open files desc limit, and hte current open files. Verify through
the nodes info API that your setting has actually taken affect.

On Mon, Mar 26, 2012 at 4:10 PM, Thomas Peuss thomas.peuss@nterra.comwrote:

Hi Thinus!

Am Montag, 26. März 2012 14:11:08 UTC+2 schrieb Thinus Prinsloo:

java.io.IOException: Too many open files
at sun.nio.ch.​ServerSocketChannelImpl.​accept0(Native Method)
at sun.nio.ch
.​ServerSocketChannelImpl.​accept(​ServerSocketChannelImpl.java:​163)
at
org.elasticsearch.common.​netty.channel.socket.nio.​NioServerSocketPipelineSink$​Boss.run(​NioServerSocketPipelineSink.​java:236)
at
org.elasticsearch.common.​netty.util.​ThreadRenamingRunnable.run(​ThreadRenamingRunnable.java:​102)
at
org.elasticsearch.common.​netty.util.internal.​DeadLockProofWorker$1.run(​DeadLockProofWorker.java:42)
at
java.util.concurrent.​ThreadPoolExecutor.runWorker(​ThreadPoolExecutor.java:1110)
at
java.util.concurrent.​ThreadPoolExecutor$Worker.run(​ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.​java:679)

You need to raise the "ulimit" for the user that is running ES. Depends
on your Linux distribution where you have to do it. On RedHat you can add a
file to /etc/security/limits.d.

Ours looks like this:
elastic - memlock unlimited
elastic soft nofile 80000
elastic hard nofile 100000

Our user is called "elastic". The first line allows the ES JVM to lock as
much memory as it wants to (memory mapped files count as memory as well!).
When the user reaches the soft limit a warning is written to the log.

CU
Thomas

--
Thinus Prinsloo
E-mail: thinus.prinsloo@gmail.com
Cell: +27 82 339 2226


(q42jaap) #5

Hi Shay,

in 0.19.9, the node stats don't seem to mention max open files desc limit.
Do you have documentation somewhere which request I have to make (with curl
for example)?

Thanks,

Jaap

On Tuesday, March 27, 2012 2:56:59 PM UTC+2, kimchy wrote:

Also, check nodes info and nodes stats API, they provide information on
the max open files desc limit, and hte current open files. Verify through
the nodes info API that your setting has actually taken affect.

On Mon, Mar 26, 2012 at 4:10 PM, Thomas Peuss <thomas...@nterra.com<javascript:>

wrote:

Hi Thinus!

Am Montag, 26. März 2012 14:11:08 UTC+2 schrieb Thinus Prinsloo:

java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch
.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)
at
org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.run(NioServerSocketPipelineSink.java:236)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

You need to raise the "ulimit" for the user that is running ES. Depends
on your Linux distribution where you have to do it. On RedHat you can add a
file to /etc/security/limits.d.

Ours looks like this:
elastic - memlock unlimited
elastic soft nofile 80000
elastic hard nofile 100000

Our user is called "elastic". The first line allows the ES JVM to lock as
much memory as it wants to (memory mapped files count as memory as well!).
When the user reaches the soft limit a warning is written to the log.

CU
Thomas

--


(q42jaap) #6

Shay,
Maybe you could also update the docs:
http://www.elasticsearch.org/guide/reference/setup/installation.html
to point at some of the tutorials that mention the service wrapper. I
almost tried to configure the servicewrapper with elasticsearch myself, but
luckyly found your github project
(https://github.com/elasticsearch/elasticsearch-servicewrapper).

Jaap

On Monday, October 15, 2012 6:34:47 PM UTC+2, Jaap Taal wrote:

Hi Shay,

in 0.19.9, the node stats don't seem to mention max open files desc limit.
Do you have documentation somewhere which request I have to make (with
curl for example)?

Thanks,

Jaap

On Tuesday, March 27, 2012 2:56:59 PM UTC+2, kimchy wrote:

Also, check nodes info and nodes stats API, they provide information on
the max open files desc limit, and hte current open files. Verify through
the nodes info API that your setting has actually taken affect.

On Mon, Mar 26, 2012 at 4:10 PM, Thomas Peuss thomas...@nterra.comwrote:

Hi Thinus!

Am Montag, 26. März 2012 14:11:08 UTC+2 schrieb Thinus Prinsloo:

java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch
.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)
at
org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.run(NioServerSocketPipelineSink.java:236)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

You need to raise the "ulimit" for the user that is running ES. Depends
on your Linux distribution where you have to do it. On RedHat you can add a
file to /etc/security/limits.d.

Ours looks like this:
elastic - memlock unlimited
elastic soft nofile 80000
elastic hard nofile 100000

Our user is called "elastic". The first line allows the ES JVM to lock
as much memory as it wants to (memory mapped files count as memory as
well!). When the user reaches the soft limit a warning is written to the
log.

CU
Thomas

--


(system) #7