Too many open files error logged

Hi,

Does anyone know how can I fix the following error that I get? A service stop and restart does not solve it.

[2016-02-09 09:00:45,994][INFO ][node ] [Thunderbolt] version[1.7.3], pid[6450], build[05d4530/2015-10-15T09:14:17Z]
[2016-02-09 09:00:45,994][INFO ][node ] [Thunderbolt] initializing ...
[2016-02-09 09:00:46,073][INFO ][plugins ] [Thunderbolt] loaded [], sites []
[2016-02-09 09:00:46,107][INFO ][env ] [Thunderbolt] using [1] data paths, mounts [[/opt (/dev/mapper/opt--directory--vg-opt--lv)]], net usable_space [51.1gb], ne$
[2016-02-09 09:00:48,434][INFO ][node ] [Thunderbolt] initialized
[2016-02-09 09:00:48,434][INFO ][node ] [Thunderbolt] starting ...
[2016-02-09 09:00:48,508][INFO ][transport ] [Thunderbolt] bound_address {inet[/127.0.0.1:9300]}, publish_address {inet[localhost/127.0.0.1:9300]}
[2016-02-09 09:00:48,523][INFO ][discovery ] [Thunderbolt] elasticsearch/CCuICQ1KQgKipMcO7l056w
[2016-02-09 09:00:52,289][INFO ][cluster.service ] [Thunderbolt] new_master [Thunderbolt][CCuICQ1KQgKipMcO7l056w][lu01pu-app-hsc.lu.euroscript.local][inet[localhost/127.0.0.1$
[2016-02-09 09:00:52,371][INFO ][http ] [Thunderbolt] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[localhost/127.0.0.1:9200]}
[2016-02-09 09:00:52,371][INFO ][node ] [Thunderbolt] started
[2016-02-09 09:00:52,572][INFO ][gateway ] [Thunderbolt] recovered [98] indices into cluster_state
[2016-02-09 09:01:00,862][WARN ][indices.cluster ] [Thunderbolt] [[logstash-2015.10.28][4]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [logstash-2015.10.28][4] failed recovery
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:162)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: [logstash-2015.10.28][4] failed to create engine
at org.elasticsearch.index.engine.InternalEngine.(InternalEngine.java:143)
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:32)
at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1355)
at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1350)
at org.elasticsearch.index.shard.IndexShard.prepareForTranslogRecovery(IndexShard.java:870)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:233)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:112)
... 3 more
Caused by: java.nio.file.FileSystemException: /opt/elasticsearch-1.7.3/data/elasticsearch/nodes/0/indices/logstash-2015.10.28/4/index/_b.cfe: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:334)
at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
at org.apache.lucene.store.FileSwitchDirectory.openInput(FileSwitchDirectory.java:172)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
at org.elasticsearch.index.store.Store$StoreDirectory.openInput(Store.java:733)
at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:113)
at org.apache.lucene.store.CompoundFileDirectory.readEntries(CompoundFileDirectory.java:166)
at org.apache.lucene.store.CompoundFileDirectory.(CompoundFileDirectory.java:106)
at org.apache.lucene.index.SegmentReader.readFieldInfos(SegmentReader.java:274)
at org.apache.lucene.index.IndexWriter.getFieldNumberMap(IndexWriter.java:867)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:819)
at org.elasticsearch.index.engine.InternalEngine.createWriter(InternalEngine.java:1119)
at org.elasticsearch.index.engine.InternalEngine.(InternalEngine.java:141)
... 9 more

Thanks for your time,
Dorin

Have a look at https://www.elastic.co/guide/en/elasticsearch/reference/2.2/setup-configuration.html#file-descriptors

Also, try to reduce the number of shards on a single machine. I guess you have too many shards, right?

@dadoonet I've tried to run the following command:
sysctl -w vm.max_map_count=262144

And then restarted the service and the error is still there. Is something else that I need to do?

Many thanks for your help,
Dorin

Did you check that this setting has really been applied? IIRC you can see it in Node Info API or Node Stats API.

If it was correctly set, then you are running out of resources. You need to launch more nodes.

How many shards / nodes do you have?

@dadoonet I think I managed to fix it. By adding the command ulimit -n 64000, now the max_file_descriptors field it's setup to 64000 and the error is gone.