Too many open files exception even after raising the open file limit

All,

I'm aware of the known issue with the limit of file descriptors, so
when I first got this issue I upped the limit. I kept getting the
exception, so I kept upping it. As an example, here is what ulimit -a
returns:

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) 16382
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 100000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

I've even tried cranking it up to 300K, and I still get the following
error:

  1. Error injecting constructor, java.io.IOException: directory '/opt/
    elasticsearch-0.15.2/data/elasticsearch/nodes/0/indices/account_26/0/
    index' exists and is a directory, but cannot be listed: list()
    returned null
    at
    org.elasticsearch.index.store.fs.NioFsStore.(NioFsStore.java:50)
    while locating org.elasticsearch.index.store.fs.NioFsStore
    at
    org.elasticsearch.index.store.StoreModule.configure(StoreModule.java:

while locating org.elasticsearch.index.store.Store
for parameter 3 at
org.elasticsearch.index.shard.service.InternalIndexShard.(InternalIndexShard.java:
108)
while locating
org.elasticsearch.index.shard.service.InternalIndexShard
at
org.elasticsearch.index.shard.IndexShardModule.configure(IndexShardModule.java:
39)
while locating org.elasticsearch.index.shard.service.IndexShard
for parameter 3 at
org.elasticsearch.index.gateway.IndexShardGatewayService.(IndexShardGatewayService.java:
74)
at
org.elasticsearch.index.gateway.IndexShardGatewayModule.configure(IndexShardGatewayModule.java:
40)
while locating
org.elasticsearch.index.gateway.IndexShardGatewayService

Or sometimes the 'too many open files exception'. Once this happens,
the cluster is dead. I have to stop the process, delete the data
directory, and restart it. Once I try indexing again, I get the same
error, at the same record count. This is only about 80K records, with
a small fraction of the number of fields I will likely eventually
need, so it seems like it should be fine. Also, lsof | wc - l is
showing a reasonable number (less than 10k) so I'm at a loss.

What's even more weird is that when I run elasticsearch as a local
node 'inter process' (In the same jvm rather than starting it in a
separate jvm) I am able to index the same number of records without
any issues. I'm using Ubuntu, is there some kind of limit somewhere
else I'm missing? I'm at a bit of a loss.

Thanks in advance,

Lucas

Make sure that the increased open file limit actually applies to the elasticsearch process you start. You can start the script with a flag that logs the number of files ES can open on startup:

bin/elasticsearch -Des.max-open-files=true -f

On Saturday, May 28, 2011 at 3:42 AM, Lucas wrote:

All,

I'm aware of the known issue with the limit of file descriptors, so
when I first got this issue I upped the limit. I kept getting the
exception, so I kept upping it. As an example, here is what ulimit -a
returns:

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) 16382
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 100000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

I've even tried cranking it up to 300K, and I still get the following
error:

  1. Error injecting constructor, java.io.IOException: directory '/opt/
    elasticsearch-0.15.2/data/elasticsearch/nodes/0/indices/account_26/0/
    index' exists and is a directory, but cannot be listed: list()
    returned null
    at
    org.elasticsearch.index.store.fs.NioFsStore.(NioFsStore.java:50)
    while locating org.elasticsearch.index.store.fs.NioFsStore
    at
    org.elasticsearch.index.store.StoreModule.configure(StoreModule.java:

while locating org.elasticsearch.index.store.Store
for parameter 3 at
org.elasticsearch.index.shard.service.InternalIndexShard.(InternalIndexShard.java:
108)
while locating
org.elasticsearch.index.shard.service.InternalIndexShard
at
org.elasticsearch.index.shard.IndexShardModule.configure(IndexShardModule.java:
39)
while locating org.elasticsearch.index.shard.service.IndexShard
for parameter 3 at
org.elasticsearch.index.gateway.IndexShardGatewayService.(IndexShardGatewayService.java:
74)
at
org.elasticsearch.index.gateway.IndexShardGatewayModule.configure(IndexShardGatewayModule.java:
40)
while locating
org.elasticsearch.index.gateway.IndexShardGatewayService

Or sometimes the 'too many open files exception'. Once this happens,
the cluster is dead. I have to stop the process, delete the data
directory, and restart it. Once I try indexing again, I get the same
error, at the same record count. This is only about 80K records, with
a small fraction of the number of fields I will likely eventually
need, so it seems like it should be fine. Also, lsof | wc - l is
showing a reasonable number (less than 10k) so I'm at a loss.

What's even more weird is that when I run elasticsearch as a local
node 'inter process' (In the same jvm rather than starting it in a
separate jvm) I am able to index the same number of records without
any issues. I'm using Ubuntu, is there some kind of limit somewhere
else I'm missing? I'm at a bit of a loss.

Thanks in advance,

Lucas

Thanks for the tip. The output I'm getting is:

[2011-05-28 20:45:10,721][INFO ][bootstrap ]
max_open_files [998]

Which explains the issue. And since the code in FileSystemUtils looks
like it's just opening files in the tmp dir until it gets an
IOException, its certainly more reliable than what ulimit is telling
me.

So, my first thought was that my more general approach in /etc/
security/limits.conf was a problem. I basically had this:

    • nofile 100000
  • is supposed to be everything '-' should mean both hard and soft.
    Obviously it isn't working though. I was running elasticsearch as
    root to try and see if that would help. Obviously not something I
    would want to do permanently, but I'm grasping at straws a bit. I
    still had the same issue. So I created a group and user both named
    'elasticsearch'. I then added them explicitly to limits.conf:

@elasticsearch hard nofile 32000
@elasticsearch soft nofile 32000
elasticsearch hard nofile 32000
elasticsearch soft nofile 32000

I switched to the elasticsearch user after doing a 'chown
elasticsearch:elasticsearch' and ran the command you mentioned.
However, there was no effect, I'm still getting 998 as the limit when
starting elasticsearch. This is running on the standard Ubuntu Server
10.10 32-bit on Amazon ec2, which shouldn't be a problem as that seems
like a very common usecase from what I can tell on this mailing list.
Has anyone else using a similar setup had this same problem? Did I
just do something wrong in limits.conf?

On May 28, 3:34 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Make sure that the increased open file limit actually applies to the elasticsearch process you start. You can start the script with a flag that logs the number of files ES can open on startup:

bin/elasticsearch -Des.max-open-files=true -f

On Saturday, May 28, 2011 at 3:42 AM, Lucas wrote:

All,

I'm aware of the known issue with the limit of file descriptors, so
when I first got this issue I upped the limit. I kept getting the
exception, so I kept upping it. As an example, here is what ulimit -a
returns:

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) 16382
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 100000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

I've even tried cranking it up to 300K, and I still get the following
error:

  1. Error injecting constructor, java.io.IOException: directory '/opt/
    elasticsearch-0.15.2/data/elasticsearch/nodes/0/indices/account_26/0/
    index' exists and is a directory, but cannot be listed: list()
    returned null
    at
    org.elasticsearch.index.store.fs.NioFsStore.(NioFsStore.java:50)
    while locating org.elasticsearch.index.store.fs.NioFsStore
    at
    org.elasticsearch.index.store.StoreModule.configure(StoreModule.java:

while locating org.elasticsearch.index.store.Store
for parameter 3 at
org.elasticsearch.index.shard.service.InternalIndexShard.(InternalIndexShard.java:
108)
while locating
org.elasticsearch.index.shard.service.InternalIndexShard
at
org.elasticsearch.index.shard.IndexShardModule.configure(IndexShardModule.java:
39)
while locating org.elasticsearch.index.shard.service.IndexShard
for parameter 3 at
org.elasticsearch.index.gateway.IndexShardGatewayService.(IndexShardGatewayService.java:
74)
at
org.elasticsearch.index.gateway.IndexShardGatewayModule.configure(IndexShardGatewayModule.java:
40)
while locating
org.elasticsearch.index.gateway.IndexShardGatewayService

Or sometimes the 'too many open files exception'. Once this happens,
the cluster is dead. I have to stop the process, delete the data
directory, and restart it. Once I try indexing again, I get the same
error, at the same record count. This is only about 80K records, with
a small fraction of the number of fields I will likely eventually
need, so it seems like it should be fine. Also, lsof | wc - l is
showing a reasonable number (less than 10k) so I'm at a loss.

What's even more weird is that when I run elasticsearch as a local
node 'inter process' (In the same jvm rather than starting it in a
separate jvm) I am able to index the same number of records without
any issues. I'm using Ubuntu, is there some kind of limit somewhere
else I'm missing? I'm at a bit of a loss.

Thanks in advance,

Lucas

We had the very same problem using the wildcard syntax in limits.conf using
the default AMI for Elastic Beanstalk's flavor of "Amazon Linux".

For us, this syntax in limits.conf did the trick:

tomcat - nofile 32000

Our Elastic Search node is embedded in a web application, hence the tomcat
username.

*-- jim
*

On Sat, May 28, 2011 at 4:57 PM, Lucas lucaslward@gmail.com wrote:

Thanks for the tip. The output I'm getting is:

[2011-05-28 20:45:10,721][INFO ][bootstrap ]
max_open_files [998]

Which explains the issue. And since the code in FileSystemUtils looks
like it's just opening files in the tmp dir until it gets an
IOException, its certainly more reliable than what ulimit is telling
me.

So, my first thought was that my more general approach in /etc/
security/limits.conf was a problem. I basically had this:

    • nofile 100000
  • is supposed to be everything '-' should mean both hard and soft.
    Obviously it isn't working though. I was running elasticsearch as
    root to try and see if that would help. Obviously not something I
    would want to do permanently, but I'm grasping at straws a bit. I
    still had the same issue. So I created a group and user both named
    'elasticsearch'. I then added them explicitly to limits.conf:

@elasticsearch hard nofile 32000
@elasticsearch soft nofile 32000
elasticsearch hard nofile 32000
elasticsearch soft nofile 32000

I switched to the elasticsearch user after doing a 'chown
elasticsearch:elasticsearch' and ran the command you mentioned.
However, there was no effect, I'm still getting 998 as the limit when
starting elasticsearch. This is running on the standard Ubuntu Server
10.10 32-bit on Amazon ec2, which shouldn't be a problem as that seems
like a very common usecase from what I can tell on this mailing list.
Has anyone else using a similar setup had this same problem? Did I
just do something wrong in limits.conf?

On May 28, 3:34 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Make sure that the increased open file limit actually applies to the
elasticsearch process you start. You can start the script with a flag that
logs the number of files ES can open on startup:

bin/elasticsearch -Des.max-open-files=true -f

On Saturday, May 28, 2011 at 3:42 AM, Lucas wrote:

All,

I'm aware of the known issue with the limit of file descriptors, so
when I first got this issue I upped the limit. I kept getting the
exception, so I kept upping it. As an example, here is what ulimit -a
returns:

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) 16382
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 100000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

I've even tried cranking it up to 300K, and I still get the following
error:

  1. Error injecting constructor, java.io.IOException: directory '/opt/
    elasticsearch-0.15.2/data/elasticsearch/nodes/0/indices/account_26/0/
    index' exists and is a directory, but cannot be listed: list()
    returned null
    at
    org.elasticsearch.index.store.fs.NioFsStore.(NioFsStore.java:50)
    while locating org.elasticsearch.index.store.fs.NioFsStore
    at
    org.elasticsearch.index.store.StoreModule.configure(StoreModule.java:

while locating org.elasticsearch.index.store.Store
for parameter 3 at

org.elasticsearch.index.shard.service.InternalIndexShard.(InternalIndexShard.java:

while locating
org.elasticsearch.index.shard.service.InternalIndexShard
at

org.elasticsearch.index.shard.IndexShardModule.configure(IndexShardModule.java:

while locating org.elasticsearch.index.shard.service.IndexShard
for parameter 3 at

org.elasticsearch.index.gateway.IndexShardGatewayService.(IndexShardGatewayService.java:

at

org.elasticsearch.index.gateway.IndexShardGatewayModule.configure(IndexShardGatewayModule.java:

while locating
org.elasticsearch.index.gateway.IndexShardGatewayService

Or sometimes the 'too many open files exception'. Once this happens,
the cluster is dead. I have to stop the process, delete the data
directory, and restart it. Once I try indexing again, I get the same
error, at the same record count. This is only about 80K records, with
a small fraction of the number of fields I will likely eventually
need, so it seems like it should be fine. Also, lsof | wc - l is
showing a reasonable number (less than 10k) so I'm at a loss.

What's even more weird is that when I run elasticsearch as a local
node 'inter process' (In the same jvm rather than starting it in a
separate jvm) I am able to index the same number of records without
any issues. I'm using Ubuntu, is there some kind of limit somewhere
else I'm missing? I'm at a bit of a loss.

Thanks in advance,

Lucas

Interesting. I added the following to my file:

root hard nofile 32000
root soft nofile 32000

And ran it as root, which worked. I still have no idea why running it
as the elasticsearch user doesn't work though.

On May 28, 5:02 pm, James Cook jc...@tracermedia.com wrote:

We had the very same problem using the wildcard syntax in limits.conf using
the default AMI for Elastic Beanstalk's flavor of "Amazon Linux".

For us, this syntax in limits.conf did the trick:

tomcat - nofile 32000

Our Elastic Search node is embedded in a web application, hence the tomcat
username.

*-- jim
*

On Sat, May 28, 2011 at 4:57 PM, Lucas lucaslw...@gmail.com wrote:

Thanks for the tip. The output I'm getting is:

[2011-05-28 20:45:10,721][INFO ][bootstrap ]
max_open_files [998]

Which explains the issue. And since the code in FileSystemUtils looks
like it's just opening files in the tmp dir until it gets an
IOException, its certainly more reliable than what ulimit is telling
me.

So, my first thought was that my more general approach in /etc/
security/limits.conf was a problem. I basically had this:

    • nofile 100000
  • is supposed to be everything '-' should mean both hard and soft.
    Obviously it isn't working though. I was running elasticsearch as
    root to try and see if that would help. Obviously not something I
    would want to do permanently, but I'm grasping at straws a bit. I
    still had the same issue. So I created a group and user both named
    'elasticsearch'. I then added them explicitly to limits.conf:

@elasticsearch hard nofile 32000
@elasticsearch soft nofile 32000
elasticsearch hard nofile 32000
elasticsearch soft nofile 32000

I switched to the elasticsearch user after doing a 'chown
elasticsearch:elasticsearch' and ran the command you mentioned.
However, there was no effect, I'm still getting 998 as the limit when
starting elasticsearch. This is running on the standard Ubuntu Server
10.10 32-bit on Amazon ec2, which shouldn't be a problem as that seems
like a very common usecase from what I can tell on this mailing list.
Has anyone else using a similar setup had this same problem? Did I
just do something wrong in limits.conf?

On May 28, 3:34 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Make sure that the increased open file limit actually applies to the
elasticsearch process you start. You can start the script with a flag that
logs the number of files ES can open on startup:

bin/elasticsearch -Des.max-open-files=true -f

On Saturday, May 28, 2011 at 3:42 AM, Lucas wrote:

All,

I'm aware of the known issue with the limit of file descriptors, so
when I first got this issue I upped the limit. I kept getting the
exception, so I kept upping it. As an example, here is what ulimit -a
returns:

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) 16382
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 100000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

I've even tried cranking it up to 300K, and I still get the following
error:

  1. Error injecting constructor, java.io.IOException: directory '/opt/
    elasticsearch-0.15.2/data/elasticsearch/nodes/0/indices/account_26/0/
    index' exists and is a directory, but cannot be listed: list()
    returned null
    at
    org.elasticsearch.index.store.fs.NioFsStore.(NioFsStore.java:50)
    while locating org.elasticsearch.index.store.fs.NioFsStore
    at
    org.elasticsearch.index.store.StoreModule.configure(StoreModule.java:

while locating org.elasticsearch.index.store.Store
for parameter 3 at

org.elasticsearch.index.shard.service.InternalIndexShard.(InternalIndexShard.java:

while locating
org.elasticsearch.index.shard.service.InternalIndexShard
at

org.elasticsearch.index.shard.IndexShardModule.configure(IndexShardModule.java:

while locating org.elasticsearch.index.shard.service.IndexShard
for parameter 3 at

org.elasticsearch.index.gateway.IndexShardGatewayService.(IndexShardGatewayService.java:

at

org.elasticsearch.index.gateway.IndexShardGatewayModule.configure(IndexShardGatewayModule.java:

while locating
org.elasticsearch.index.gateway.IndexShardGatewayService

Or sometimes the 'too many open files exception'. Once this happens,
the cluster is dead. I have to stop the process, delete the data
directory, and restart it. Once I try indexing again, I get the same
error, at the same record count. This is only about 80K records, with
a small fraction of the number of fields I will likely eventually
need, so it seems like it should be fine. Also, lsof | wc - l is
showing a reasonable number (less than 10k) so I'm at a loss.

What's even more weird is that when I run elasticsearch as a local
node 'inter process' (In the same jvm rather than starting it in a
separate jvm) I am able to index the same number of records without
any issues. I'm using Ubuntu, is there some kind of limit somewhere
else I'm missing? I'm at a bit of a loss.

Thanks in advance,

Lucas

  • deleted -

I encountered this same problem on precise and I discovered the solution.

I would run ulimits as so:
sudo -u elasticsearch sh -c 'ulimit -a'
That returned the number correctly that I'd set in limits.conf (64000).

I tried the max_limits debug trick but found it entirely unhelpful. It returned 'max_files [0]'. Not good. Instead I ran the following:
curl -o - http://localhost:9200/_nodes?process

And I could see that returned 'max_file_descriptors':4096

So that confirmed that elasticsearch wasn't getting all the file handles as configured.

I discovered the problem is to do with the way in which elasticsearch is invoked. In the /etc/init/elasticsearch.conf you can see that it is run inside dash:

su -s /bin/dash -c "/usr/bin/elasticsearch -f" elasticsearch

Running the same but with ulimit and I was returned 4096:
sudo su -s /bin/sh -c 'ulimit -a' elasticsearch

So the problem must be to so with dash.

I discovered that I needed to add the following to /etc/pam.d/common-session for dash:
session required pam_limits.so

I had only added it to /etc/pam.d/common-session-noninteractive (which worked for sh). Probably safest to add to both although only common-session should be needed.

This fixed my problem.