Too many open files but nofile set to 256000

I am sure this must be obvious, but I am missing it.

I have cranked the number of open files in /etc/security/limits.conf to
256000 for soft and hard. Elasticsearch confirms when loading that
nofile=256000

After an hour or so, one of the ES servers starts reporting too many open
files.

This is the setup:

6 Servers
4 Servers running ES with data
2 Servers running ES with no data (just routers)

There are two indexes:

Index 1 - 10 Shards, 1 Replicas (ie Master + 1 replica) - 15GB Index (30GB
inc replica)
Index 2 - 1 Shard, 0 Replicas - 525MB

I have checked number of open files for ES and its only ever around 50-200.
So we should be ok. I have been cranking up the nofile setting for the past
few weeks but it makes no difference.

Any ideas?

--

Maybe you should check by
cat /proc/es pid/limit |grep file

Only set module is always not essential.

Sent from my Windows Phone

From: Marietta
Sent: 25/08/2012 2:12 AM
To: elasticsearch@googlegroups.com
Subject: Too many open files but nofile set to 256000

I am sure this must be obvious, but I am missing it.

I have cranked the number of open files in /etc/security/limits.conf to
256000 for soft and hard. Elasticsearch confirms when loading that
nofile=256000

After an hour or so, one of the ES servers starts reporting too many open
files.

This is the setup:

6 Servers
4 Servers running ES with data
2 Servers running ES with no data (just routers)

There are two indexes:

Index 1 - 10 Shards, 1 Replicas (ie Master + 1 replica) - 15GB Index (30GB
inc replica)
Index 2 - 1 Shard, 0 Replicas - 525MB

I have checked number of open files for ES and its only ever around 50-200.
So we should be ok. I have been cranking up the nofile setting for the past
few weeks but it makes no difference.

Any ideas?

--

--

Can you double check that the max open files is actually applied using the node info API (with the process flag set). If its set, can you gist the lsof output?

On Aug 24, 2012, at 9:12 PM, Marietta mariettavps@gmail.com wrote:

I am sure this must be obvious, but I am missing it.

I have cranked the number of open files in /etc/security/limits.conf to 256000 for soft and hard. Elasticsearch confirms when loading that nofile=256000

After an hour or so, one of the ES servers starts reporting too many open files.

This is the setup:

6 Servers
4 Servers running ES with data
2 Servers running ES with no data (just routers)

There are two indexes:

Index 1 - 10 Shards, 1 Replicas (ie Master + 1 replica) - 15GB Index (30GB inc replica)
Index 2 - 1 Shard, 0 Replicas - 525MB

I have checked number of open files for ES and its only ever around 50-200. So we should be ok. I have been cranking up the nofile setting for the past few weeks but it makes no difference.

Any ideas?

--

--

I have cranked the number of open files in /etc/security/limits.conf to 256000 for soft and hard. Elasticsearch confirms when loading that nofile=256000

After an hour or so, one of the ES servers starts reporting too many open files.

Just looking at my deploy scripts, I have the following in /etc/sysctl.d/60-maxfiles.conf:

fs.file-max = 500000

Cheers,
Dan

Dan Fairs | dan.fairs@gmail.com | @danfairs | www.fezconsulting.com

--

lsof gist - https://gist.github.com/3494765

_nodes shows:

  "process" : {
    "refresh_interval" : 1000,
    "id" : 13266,
    "max_file_descriptors" : 256000

What am I missing?

--

I have a similar case on a 3 node elastic cluster with nofile set to
128000

elasticsearch version - 0.19.8

As an example on one of the nodes right now I see that elastic is close to
the limit and amount of open files is growing:

[root@box01 ~]# lsof -u elastic | wc -l108576
[root@box01 ~]# lsof -u elastic | wc -l
108673

I've tried to compare lsof results for two runs and the difference is a lot
of files like :
...

java 22771 elastic *858r REG 8,2 7004 25368715
/home/elastic/data/elastic280/nodes/0/indices/logstash-2012.10.25/0/index/_pgc.nrm
java 22771 elastic *859u REG 8,2 1560 25368719
/home/elastic/data/elastic280/nodes/0/indices/logstash-2012.10.25/0/index/_pgd.tis
java 22771 elastic *860u REG 8,2 411 25368721
/home/elastic/data/elastic280/nodes/0/indices/logstash-2012.10.25/0/index/_pgd.frq
java 22771 elastic *861r REG 8,2 348 25368722
/home/elastic/data/elastic280/nodes/0/indices/logstash-2012.10.25/0/index/_pgd.prx
java 22771 elastic *862r REG 8,2 2252 25368717
/home/elastic/data/elastic280/nodes/0/indices/logstash-2012.10.25/0/index/_pgd.fdt
java 22771 elastic *863r REG 8,2 28 25368718
/home/elastic/data/elastic280/nodes/0/indices/logstash-2012.10.25/0/index/_pgd.fdx
java 22771 elastic *864r REG 8,2 88 25368723
/home/elastic/data/elastic280/nodes/0/indices/logstash-2012.10.25/0/index/_pgd.nrm
....

Thanks
-- Alex

On Monday, August 27, 2012 8:50:03 PM UTC-7, Marietta wrote:

lsof gist - https://gist.github.com/3494765

_nodes shows:

  "process" : {
    "refresh_interval" : 1000,
    "id" : 13266,
    "max_file_descriptors" : 256000

What am I missing?

--

I've fixed this by optimizing with max_num_segments set low, and avoided it
by tweaking my merge rules.

On Thursday, October 25, 2012 5:54:13 PM UTC-5, AlexLevin wrote:

I have a similar case on a 3 node elastic cluster with nofile set to
128000

elasticsearch version - 0.19.8

As an example on one of the nodes right now I see that elastic is close to
the limit and amount of open files is growing:

[root@box01 ~]# lsof -u elastic | wc -l108576
[root@box01 ~]# lsof -u elastic | wc -l
108673

I've tried to compare lsof results for two runs and the difference is a
lot of files like :
...

java 22771 elastic *858r REG 8,2 7004
25368715
/home/elastic/data/elastic280/nodes/0/indices/logstash-2012.10.25/0/index/_pgc.nrm
java 22771 elastic *859u REG 8,2 1560
25368719
/home/elastic/data/elastic280/nodes/0/indices/logstash-2012.10.25/0/index/_pgd.tis
java 22771 elastic *860u REG 8,2 411
25368721
/home/elastic/data/elastic280/nodes/0/indices/logstash-2012.10.25/0/index/_pgd.frq
java 22771 elastic *861r REG 8,2 348
25368722
/home/elastic/data/elastic280/nodes/0/indices/logstash-2012.10.25/0/index/_pgd.prx
java 22771 elastic *862r REG 8,2 2252
25368717
/home/elastic/data/elastic280/nodes/0/indices/logstash-2012.10.25/0/index/_pgd.fdt
java 22771 elastic *863r REG 8,2 28
25368718
/home/elastic/data/elastic280/nodes/0/indices/logstash-2012.10.25/0/index/_pgd.fdx
java 22771 elastic *864r REG 8,2 88
25368723
/home/elastic/data/elastic280/nodes/0/indices/logstash-2012.10.25/0/index/_pgd.nrm
....

Thanks
-- Alex

On Monday, August 27, 2012 8:50:03 PM UTC-7, Marietta wrote:

lsof gist - https://gist.github.com/3494765

_nodes shows:

  "process" : {
    "refresh_interval" : 1000,
    "id" : 13266,
    "max_file_descriptors" : 256000

What am I missing?

--

helo Marietta
did you got any resolution ?

On Sunday, 26 August 2012 01:25:29 UTC+1, LiMac wrote:

Maybe you should check by
cat /proc/es pid/limit |grep file

Only set module is always not essential.

Sent from my Windows Phone

From: Marietta
Sent: 25/08/2012 2:12 AM
To: elasti...@googlegroups.com <javascript:>
Subject: Too many open files but nofile set to 256000

I am sure this must be obvious, but I am missing it.

I have cranked the number of open files in /etc/security/limits.conf to
256000 for soft and hard. Elasticsearch confirms when loading that
nofile=256000

After an hour or so, one of the ES servers starts reporting too many open
files.

This is the setup:

6 Servers
4 Servers running ES with data
2 Servers running ES with no data (just routers)

There are two indexes:

Index 1 - 10 Shards, 1 Replicas (ie Master + 1 replica) - 15GB Index (30GB
inc replica)
Index 2 - 1 Shard, 0 Replicas - 525MB

I have checked number of open files for ES and its only ever around
50-200. So we should be ok. I have been cranking up the nofile setting for
the past few weeks but it makes no difference.

Any ideas?

--

--

For anyone with a problem like this, it may be worth confirming the numbers
within elasticsearch as well using the nodes info api:

/_nodes?process

gives the max_file_descriptors:

{

  • refresh_interval: 1000,
  • id: 13919,
  • max_file_descriptors: 25000

}

/_nodes/process/stats

gives:

open_file_descriptors: 516,

in the output:

On Thursday, 15 November 2012 11:12:43 UTC, mohsin husen wrote:

helo Marietta
did you got any resolution ?

On Sunday, 26 August 2012 01:25:29 UTC+1, LiMac wrote:

Maybe you should check by
cat /proc/es pid/limit |grep file

Only set module is always not essential.

Sent from my Windows Phone

From: Marietta
Sent: 25/08/2012 2:12 AM
To: elasti...@googlegroups.com
Subject: Too many open files but nofile set to 256000

I am sure this must be obvious, but I am missing it.

I have cranked the number of open files in /etc/security/limits.conf to
256000 for soft and hard. Elasticsearch confirms when loading that
nofile=256000

After an hour or so, one of the ES servers starts reporting too many open
files.

This is the setup:

6 Servers
4 Servers running ES with data
2 Servers running ES with no data (just routers)

There are two indexes:

Index 1 - 10 Shards, 1 Replicas (ie Master + 1 replica) - 15GB Index
(30GB inc replica)
Index 2 - 1 Shard, 0 Replicas - 525MB

I have checked number of open files for ES and its only ever around
50-200. So we should be ok. I have been cranking up the nofile setting for
the past few weeks but it makes no difference.

Any ideas?

--

--