Too many open files warning

Getting the below warning messages continuously. Not sure what should be done. Saw some of the relevant posts asking to increase the number of file descriptors.

How to do the same?

Even if I increase now, Will I encounter the same issue on addition of new indices.
(presently working with around 400 indices, 6 shards and 1 replica). The number of indices tend to grow more.

[03:58:24,165][WARN ][cluster.action.shard ] [node1] received shard failed for [index9][2], node[node_hash3], [P], s[INITIALIZING], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index9][2] failed recovery]; nested: EngineCreationFailureException[[index9][2] failed to open reader on writer]; nested: FileNotFoundException[/data/elasticsearch/whatever/nodes/0/indices/index9/2/index/segments_1 (Too many open files)]; ]]
[03:58:24,166][WARN ][cluster.action.shard ] [node1] received shard failed for [index15][0], node[node_hash2], [P], s[INITIALIZING], reason [Failed to create shard, message [IndexShardCreationException[[index15][0] failed to create shard]; nested: IOException[directory '/data/elasticsearch/whatever/nodes/0/indices/index15/0/index' exists and is a directory, but cannot be listed: list() returned null]; ]]
[03:58:24,195][WARN ][cluster.action.shard ] [node1] received shard failed for [index16][3], node[node_hash3], [P], s[INITIALIZING], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index16][3] failed recovery]; nested: EngineCreationFailureException[[index16][3] failed to open reader on writer]; nested: FileNotFoundException[/data/elasticsearch/whatever/nodes/0/indices/index16/3/index/segments_1 (Too many open files)]; ]]
[03:58:24,196][WARN ][cluster.action.shard ] [node1] received shard failed for [index17][0], node[node_hash3], [P], s[INITIALIZING], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index17][0] failed recovery]; nested: EngineCreationFailureException[[index17][0] failed to open reader on writer]; nested: FileNotFoundException[/data/elasticsearch/whatever/nodes/0/indices/index17/0/index/segments_1 (Too many open files)]; ]]
[03:58:24,198][WARN ][cluster.action.shard ] [node1] received shard failed for [index21][4], node[node_hash3], [P], s[INITIALIZING], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index21][4] failed recovery]; nested: EngineCreationFailureException[[index21][4] failed to create engine]; nested: LockReleaseFailedException[Cannot forcefully unlock a NativeFSLock which is held by another indexer component: /data/elasticsearch/whatever/nodes/0/indices/index21/4/index/write.lock]; ]]

Output of nodes api curl -XGET 'http://localhost:9200/_nodes?os=true&process=true&pretty=true'

{
"ok" : true,
"cluster_name" : "whatever",
"nodes" : {
"node_hash1" : {
"name" : "node1",
"transport_address" : "transportip1",
"hostname" : "myhostip1",
"version" : "0.20.4",
"http_address" : "httpip1",
"attributes" : {
"data" : "false",
"master" : "true"
},
"os" : {
"refresh_interval" : 1000,
"available_processors" : 8,
"cpu" : {
"vendor" : "Intel",
"model" : "Xeon",
"mhz" : 2133,
"total_cores" : 8,
"total_sockets" : 8,
"cores_per_socket" : 16,
"cache_size" : "4kb",
"cache_size_in_bytes" : 4096
},
"mem" : {
"total" : "7gb",
"total_in_bytes" : 7516336128
},
"swap" : {
"total" : "30gb",
"total_in_bytes" : 32218378240
}
},
"process" : {
"refresh_interval" : 1000,
"id" : 26188,
"max_file_descriptors" : 16384
}
},
"node_hash2" : {
"name" : "node2",
"transport_address" : "transportip2",
"hostname" : "myhostip2",
"version" : "0.20.4",
"attributes" : {
"master" : "false"
},
"os" : {
"refresh_interval" : 1000,
"available_processors" : 4,
"cpu" : {
"vendor" : "Intel",
"model" : "Xeon",
"mhz" : 2400,
"total_cores" : 4,
"total_sockets" : 4,
"cores_per_socket" : 32,
"cache_size" : "20kb",
"cache_size_in_bytes" : 20480
},
"mem" : {
"total" : "34.1gb",
"total_in_bytes" : 36700303360
},
"swap" : {
"total" : "0b",
"total_in_bytes" : 0
}
},
"process" : {
"refresh_interval" : 1000,
"id" : 24883,
"max_file_descriptors" : 16384
}
},
"node_hash3" : {
"name" : "node3",
"transport_address" : "transportip3",
"hostname" : "myhostip3",
"version" : "0.20.4",
"attributes" : {
"master" : "false"
},
"os" : {
"refresh_interval" : 1000,
"available_processors" : 4,
"cpu" : {
"vendor" : "Intel",
"model" : "Xeon",
"mhz" : 2666,
"total_cores" : 4,
"total_sockets" : 4,
"cores_per_socket" : 16,
"cache_size" : "8kb",
"cache_size_in_bytes" : 8192
},
"mem" : {
"total" : "34.1gb",
"total_in_bytes" : 36700303360
},
"swap" : {
"total" : "0b",
"total_in_bytes" : 0
}
},
"process" : {
"refresh_interval" : 1000,
"id" : 25328,
"max_file_descriptors" : 16384
}
}
}

Hello,

Note this part:

  "process" : { 
    "refresh_interval" : 1000, 
    "id" : 25328, 
    "max_file_descriptors" : *16384* 
  } 

Don't be afraid to increase quadruple this. Your 600 indices each have
some number of shards, each shard some number of replicas, each of these is
a Lucene index, each Lucene index has some number of segments, each segment
consists of a number of files. If you were to look at all these files
you'd see you hit that 16384 number. Also, directories count as do open
socket connections.

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service
Search Analytics - Cloud Monitoring Tools & Services | Sematext

On Tuesday, April 9, 2013 8:02:19 AM UTC-4, vamsi krishna wrote:

Getting the below warning messages continuously. Not sure what should be
done. Saw some of the relevant posts asking to increase the number of file
descriptors.

How to do the same?

Even if I increase now, Will I encounter the same issue on addition of new
indices.
(presently working with around 400 indices, 6 shards and 1 replica). The
number of indices tend to grow more.

[03:58:24,165][WARN ][cluster.action.shard ] [node1] received shard
failed for [index9][2], node[node_hash3], [P], s[INITIALIZING], reason
[Failed to start shard, message
[IndexShardGatewayRecoveryException[[index9][2] failed recovery]; nested:
EngineCreationFailureException[[index9][2] failed to open reader on
writer];
nested:
FileNotFoundException[/data/elasticsearch/whatever/nodes/0/indices/index9/2/index/segments_1

(Too many open files)]; ]]
[03:58:24,166][WARN ][cluster.action.shard ] [node1] received shard
failed for [index15][0], node[node_hash2], [P], s[INITIALIZING], reason
[Failed to create shard, message [IndexShardCreationException[[index15][0]
failed to create shard]; nested: IOException[directory
'/data/elasticsearch/whatever/nodes/0/indices/index15/0/index' exists and
is
a directory, but cannot be listed: list() returned null]; ]]
[03:58:24,195][WARN ][cluster.action.shard ] [node1] received shard
failed for [index16][3], node[node_hash3], [P], s[INITIALIZING], reason
[Failed to start shard, message
[IndexShardGatewayRecoveryException[[index16][3] failed recovery]; nested:
EngineCreationFailureException[[index16][3] failed to open reader on
writer]; nested:
FileNotFoundException[/data/elasticsearch/whatever/nodes/0/indices/index16/3/index/segments_1

(Too many open files)]; ]]
[03:58:24,196][WARN ][cluster.action.shard ] [node1] received shard
failed for [index17][0], node[node_hash3], [P], s[INITIALIZING], reason
[Failed to start shard, message
[IndexShardGatewayRecoveryException[[index17][0] failed recovery]; nested:
EngineCreationFailureException[[index17][0] failed to open reader on
writer]; nested:
FileNotFoundException[/data/elasticsearch/whatever/nodes/0/indices/index17/0/index/segments_1

(Too many open files)]; ]]
[03:58:24,198][WARN ][cluster.action.shard ] [node1] received shard
failed for [index21][4], node[node_hash3], [P], s[INITIALIZING], reason
[Failed to start shard, message
[IndexShardGatewayRecoveryException[[index21][4] failed recovery]; nested:
EngineCreationFailureException[[index21][4] failed to create engine];
nested: LockReleaseFailedException[Cannot forcefully unlock a NativeFSLock
which is held by another indexer component:
/data/elasticsearch/whatever/nodes/0/indices/index21/4/index/write.lock];
]]

Output of nodes api curl -XGET
'http://localhost:9200/_nodes?os=true&process=true&pretty=true'

{
"ok" : true,
"cluster_name" : "whatever",
"nodes" : {
"node_hash1" : {
"name" : "node1",
"transport_address" : "transportip1",
"hostname" : "myhostip1",
"version" : "0.20.4",
"http_address" : "httpip1",
"attributes" : {
"data" : "false",
"master" : "true"
},
"os" : {
"refresh_interval" : 1000,
"available_processors" : 8,
"cpu" : {
"vendor" : "Intel",
"model" : "Xeon",
"mhz" : 2133,
"total_cores" : 8,
"total_sockets" : 8,
"cores_per_socket" : 16,
"cache_size" : "4kb",
"cache_size_in_bytes" : 4096
},
"mem" : {
"total" : "7gb",
"total_in_bytes" : 7516336128
},
"swap" : {
"total" : "30gb",
"total_in_bytes" : 32218378240
}
},
"process" : {
"refresh_interval" : 1000,
"id" : 26188,
"max_file_descriptors" : 16384
}
},
"node_hash2" : {
"name" : "node2",
"transport_address" : "transportip2",
"hostname" : "myhostip2",
"version" : "0.20.4",
"attributes" : {
"master" : "false"
},
"os" : {
"refresh_interval" : 1000,
"available_processors" : 4,
"cpu" : {
"vendor" : "Intel",
"model" : "Xeon",
"mhz" : 2400,
"total_cores" : 4,
"total_sockets" : 4,
"cores_per_socket" : 32,
"cache_size" : "20kb",
"cache_size_in_bytes" : 20480
},
"mem" : {
"total" : "34.1gb",
"total_in_bytes" : 36700303360
},
"swap" : {
"total" : "0b",
"total_in_bytes" : 0
}
},
"process" : {
"refresh_interval" : 1000,
"id" : 24883,
"max_file_descriptors" : 16384
}
},
"node_hash3" : {
"name" : "node3",
"transport_address" : "transportip3",
"hostname" : "myhostip3",
"version" : "0.20.4",
"attributes" : {
"master" : "false"
},
"os" : {
"refresh_interval" : 1000,
"available_processors" : 4,
"cpu" : {
"vendor" : "Intel",
"model" : "Xeon",
"mhz" : 2666,
"total_cores" : 4,
"total_sockets" : 4,
"cores_per_socket" : 16,
"cache_size" : "8kb",
"cache_size_in_bytes" : 8192
},
"mem" : {
"total" : "34.1gb",
"total_in_bytes" : 36700303360
},
"swap" : {
"total" : "0b",
"total_in_bytes" : 0
}
},
"process" : {
"refresh_interval" : 1000,
"id" : 25328,
"max_file_descriptors" : 16384
}
}
}

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Too-many-open-files-warning-tp4033111.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

my advice here is: set the file handle limit to "unlimited" and you are
good here

simon

On Wednesday, April 10, 2013 7:19:41 PM UTC+2, Otis Gospodnetic wrote:

Hello,

Note this part:

  "process" : { 
    "refresh_interval" : 1000, 
    "id" : 25328, 
    "max_file_descriptors" : *16384* 
  } 

Don't be afraid to increase quadruple this. Your 600 indices each have
some number of shards, each shard some number of replicas, each of these is
a Lucene index, each Lucene index has some number of segments, each segment
consists of a number of files. If you were to look at all these files
you'd see you hit that 16384 number. Also, directories count as do open
socket connections.

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service
Search Analytics - Cloud Monitoring Tools & Services | Sematext

On Tuesday, April 9, 2013 8:02:19 AM UTC-4, vamsi krishna wrote:

Getting the below warning messages continuously. Not sure what should be
done. Saw some of the relevant posts asking to increase the number of
file
descriptors.

How to do the same?

Even if I increase now, Will I encounter the same issue on addition of
new
indices.
(presently working with around 400 indices, 6 shards and 1 replica). The
number of indices tend to grow more.

[03:58:24,165][WARN ][cluster.action.shard ] [node1] received shard
failed for [index9][2], node[node_hash3], [P], s[INITIALIZING], reason
[Failed to start shard, message
[IndexShardGatewayRecoveryException[[index9][2] failed recovery]; nested:
EngineCreationFailureException[[index9][2] failed to open reader on
writer];
nested:
FileNotFoundException[/data/elasticsearch/whatever/nodes/0/indices/index9/2/index/segments_1

(Too many open files)]; ]]
[03:58:24,166][WARN ][cluster.action.shard ] [node1] received shard
failed for [index15][0], node[node_hash2], [P], s[INITIALIZING], reason
[Failed to create shard, message
[IndexShardCreationException[[index15][0]
failed to create shard]; nested: IOException[directory
'/data/elasticsearch/whatever/nodes/0/indices/index15/0/index' exists and
is
a directory, but cannot be listed: list() returned null]; ]]
[03:58:24,195][WARN ][cluster.action.shard ] [node1] received shard
failed for [index16][3], node[node_hash3], [P], s[INITIALIZING], reason
[Failed to start shard, message
[IndexShardGatewayRecoveryException[[index16][3] failed recovery];
nested:
EngineCreationFailureException[[index16][3] failed to open reader on
writer]; nested:
FileNotFoundException[/data/elasticsearch/whatever/nodes/0/indices/index16/3/index/segments_1

(Too many open files)]; ]]
[03:58:24,196][WARN ][cluster.action.shard ] [node1] received shard
failed for [index17][0], node[node_hash3], [P], s[INITIALIZING], reason
[Failed to start shard, message
[IndexShardGatewayRecoveryException[[index17][0] failed recovery];
nested:
EngineCreationFailureException[[index17][0] failed to open reader on
writer]; nested:
FileNotFoundException[/data/elasticsearch/whatever/nodes/0/indices/index17/0/index/segments_1

(Too many open files)]; ]]
[03:58:24,198][WARN ][cluster.action.shard ] [node1] received shard
failed for [index21][4], node[node_hash3], [P], s[INITIALIZING], reason
[Failed to start shard, message
[IndexShardGatewayRecoveryException[[index21][4] failed recovery];
nested:
EngineCreationFailureException[[index21][4] failed to create engine];
nested: LockReleaseFailedException[Cannot forcefully unlock a
NativeFSLock
which is held by another indexer component:
/data/elasticsearch/whatever/nodes/0/indices/index21/4/index/write.lock];
]]

Output of nodes api curl -XGET
'http://localhost:9200/_nodes?os=true&process=true&pretty=true'

{
"ok" : true,
"cluster_name" : "whatever",
"nodes" : {
"node_hash1" : {
"name" : "node1",
"transport_address" : "transportip1",
"hostname" : "myhostip1",
"version" : "0.20.4",
"http_address" : "httpip1",
"attributes" : {
"data" : "false",
"master" : "true"
},
"os" : {
"refresh_interval" : 1000,
"available_processors" : 8,
"cpu" : {
"vendor" : "Intel",
"model" : "Xeon",
"mhz" : 2133,
"total_cores" : 8,
"total_sockets" : 8,
"cores_per_socket" : 16,
"cache_size" : "4kb",
"cache_size_in_bytes" : 4096
},
"mem" : {
"total" : "7gb",
"total_in_bytes" : 7516336128
},
"swap" : {
"total" : "30gb",
"total_in_bytes" : 32218378240
}
},
"process" : {
"refresh_interval" : 1000,
"id" : 26188,
"max_file_descriptors" : 16384
}
},
"node_hash2" : {
"name" : "node2",
"transport_address" : "transportip2",
"hostname" : "myhostip2",
"version" : "0.20.4",
"attributes" : {
"master" : "false"
},
"os" : {
"refresh_interval" : 1000,
"available_processors" : 4,
"cpu" : {
"vendor" : "Intel",
"model" : "Xeon",
"mhz" : 2400,
"total_cores" : 4,
"total_sockets" : 4,
"cores_per_socket" : 32,
"cache_size" : "20kb",
"cache_size_in_bytes" : 20480
},
"mem" : {
"total" : "34.1gb",
"total_in_bytes" : 36700303360
},
"swap" : {
"total" : "0b",
"total_in_bytes" : 0
}
},
"process" : {
"refresh_interval" : 1000,
"id" : 24883,
"max_file_descriptors" : 16384
}
},
"node_hash3" : {
"name" : "node3",
"transport_address" : "transportip3",
"hostname" : "myhostip3",
"version" : "0.20.4",
"attributes" : {
"master" : "false"
},
"os" : {
"refresh_interval" : 1000,
"available_processors" : 4,
"cpu" : {
"vendor" : "Intel",
"model" : "Xeon",
"mhz" : 2666,
"total_cores" : 4,
"total_sockets" : 4,
"cores_per_socket" : 16,
"cache_size" : "8kb",
"cache_size_in_bytes" : 8192
},
"mem" : {
"total" : "34.1gb",
"total_in_bytes" : 36700303360
},
"swap" : {
"total" : "0b",
"total_in_bytes" : 0
}
},
"process" : {
"refresh_interval" : 1000,
"id" : 25328,
"max_file_descriptors" : 16384
}
}
}

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Too-many-open-files-warning-tp4033111.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Thu, 2013-04-11 at 05:37 -0700, simonw wrote:

my advice here is: set the file handle limit to "unlimited" and you
are good here

On some (all? many?) versions of linux you can't set it to unlimited, in
which case set it to a high number, eg 128k

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks everyone for the replies.

@simon and @Clinton - I have thought of that before. But, would that have any performance issue? (setting number of open file descriptors to unlimited at OS level)

it can be set on a per-user or per-group basis too.

On Thu, Apr 18, 2013 at 2:05 PM, ovamsikrishna ovamsikrishna@gmail.comwrote:

Thanks everyone for the replies.

@simon and @Clinton - I have thought of that before. But, would that have
any performance issue? (setting number of open file descriptors to
unlimited
at OS level)

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Too-many-open-files-warning-tp4033111p4033572.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Anurag <0xB20A82C1>
http://web.gnuer.org/blog/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I always do this... (unlimited or very high) The problem is that most of
the time you run into the high num of open files problem it only lasts for
a second or so and then the number goes down quickly. Don't worry what you
don't want is running our of file handles! I think its safe to set
unlimited.

simon

On Thursday, April 18, 2013 10:35:49 AM UTC+2, vamsi krishna wrote:

Thanks everyone for the replies.

@simon and @Clinton - I have thought of that before. But, would that have
any performance issue? (setting number of open file descriptors to
unlimited
at OS level)

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Too-many-open-files-warning-tp4033111p4033572.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The Linux resource limit for ulimit can not exceed the hard limit given
by the file descriptor table of the kernel.

Under Linux 2.0, back in 1998, you had 256 file descriptors in a
process, and a maximum of 1024 for the system, and if you wanted more,
you had to hack the kernel header files und recompile :slight_smile:
https://www.redhat.com/archives/redhat-list/1998-November/msg03103.html

Under Linux 2.2 the resource limit was raised to 1024, and for 3.0 it
was raised to 4096.
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=0ac1ee0bfec2a4ad118f907ce586d0dfd8db7641

Today, Linux has a hardcoded limit of 1024*1024 file descriptors, which
is later raised to "unlimited" (MAXINT) see
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/file.c

In earlier days, the Linux kernel reserved the space for the file
descriptors in a static array. So it was a ridiculous waste of memory if
you had set this limit too high.

You can check your kernel file limit with cat /proc/sys/fs/file-max and
the current situation of the file descriptor allocation with cat
/proc/sys/fs/file-nr (the numbers are: currently active, currently
inactive, maximum)

Just to note, there is a difference between soft/hard resource limit,
see

" Applications are increasingly using more than 1024 file descriptors.
It is not recommended to increase the default soft limit of file
descriptors because it may break applications that use the select()
call. However, it is safe to increase the default hard limit; that way,
applications requiring a large amount of file descriptors can increase
their soft limit without needing root privileges and without any user
intervention."

Dan Kegel documented the situation of file handles in his C10k paper
some time ago The C10K problem

To make things more unpredictable, the built-in Linux kernel limit and
the default resource limit value for ulimit depend on the Linux
distributor. For example, Redhat Enterprise Linux still keeps the ulimit
setting of 1024.

For Elasticsearch, I have a rule of thumb: each shard (each Lucene
index) takes ~150 file descriptors (files, pipes, epolls, sockets, etc.
maybe more, up to 400 when segment merging gives peaks of file
descriptor consumption, which can be decreased by switching to the
compound index format). That's the reason why ES works with unmodified
1024 settings on a single machine with the 5 shard default, but as soon
as more nodes and replica shards and more segment merging is going on,
the 1024 limit will be exceeded.

Jörg

Am 12.04.13 12:08, schrieb Clinton Gormley:

On Thu, 2013-04-11 at 05:37 -0700, simonw wrote:

my advice here is: set the file handle limit to "unlimited" and you
are good here
On some (all? many?) versions of linux you can't set it to unlimited, in
which case set it to a high number, eg 128k

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks @simonw-2...
In my case, I never saw it going down. It keeps on increasing. (May be because the number of indices keeps on increasing in my case) I'm already using the compound_format setting. That is the reason, I'm bit worried about the performance.