Elasticsearch - Too many open files

adrianolimit · March 14, 2016, 10:47am

Hello Elasticusers,

I have an issue with Elasticsearch, to begin this is my Elasticsearch cluster configuration :

3 nodes with 8Gb of RAM
Mounted disk to store data, logs, snapshots... (400Gb)

For the data, I have currently:

135 indices
806 shards
80 000 000 of documents
44 Gb of data

My cluster was green and I didn't have issue, however since today I have noticed that I can't create new indices, my cluster becomes red and the shard allocation/creation/replication's processes lose their minds.

This is an example of my log file :

[2016-03-14 09:38:44,295][DEBUG][action.admin.indices.stats] [Cluster 2.2.0 ES1] [indices:monitor/stats] failed to execute operation for shard [[my_index][2], node[eOINqJryQ8yH9Pbg9S5Kag], [P], v[25], s[STARTED], a[id=addzc0pdTLGJVgEXAluIAg]]
ElasticsearchException[failed to refresh store stats]; nested: FileSystemException[/mnt/mounted_disk/my_index/2/index: Too many open files];
at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1534)
at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1519)
at org.elasticsearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:55)
at org.elasticsearch.index.store.Store.stats(Store.java:293)
at org.elasticsearch.index.shard.IndexShard.storeStats(IndexShard.java:665)
at org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:134)
at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:165)
at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:47)
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:409)
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:388)
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:375)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:350)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.file.FileSystemException: /mnt/mounted_disk/my_index/2/index: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
at java.nio.file.Files.newDirectoryStream(Files.java:457)
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:190)
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:202)
at org.elasticsearch.index.store.FsDirectoryService$1.listAll(FsDirectoryService.java:127)
at org.apache.lucene.store.FilterDirectory.listAll(FilterDirectory.java:57)
at org.apache.lucene.store.FilterDirectory.listAll(FilterDirectory.java:57)
at org.elasticsearch.index.store.Store$StoreStatsCache.estimateSize(Store.java:1540)
at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1532)
... 15 more

I don't know what is the issue...
Maybe there are too many shards? Too many indices?

I have also noticed that the cached memory increases constantly and never decreases :

At the beginning of the indexation :

During the indexation :

At the end :

Moreover, this command doesn't clear the cache memory :

curl -XPOST 'http://localhost:9200/_cache/clear'

Is it possible that is a cache issue? Why the cache memory never decreases?

Hope you could help me

thn · March 14, 2016, 6:03pm

I'm assuming you already checked out "ulimit" settings in Linux.

In theory, each shard is equivalent to a Lucene index and it can hold up to 2B documents. Based on the numbers you gave above, with 80M documents, you should use less indices and shards to give the system room to breath. I suggest, you take a look at your business requirements, how you want data indexed and searched, how often you need to delete old data if any, etc... then do your best to estimate how many indices and shards you should have? Hopefully, you can find the answer that is applicable to your data and business requirements.

If you truly need to have that many indices and shards, you'll need more data nodes to spread them out.

In my own scenario, I have similar hardware with a little more system RAM (32GB) but the HEAP_SIZE is configured at 8GB each. I have three indices (each has 5 shards, no replicas), each index has 1B documents and I don't have the problem you have. I don't do replicas in this case b/c I'm testing the system.

warkolm · March 15, 2016, 2:28am

Check out this page of the Guide, it should help https://www.elastic.co/guide/en/elasticsearch/guide/master/_file_descriptors_and_mmap.html

adrianolimit · March 18, 2016, 8:22am

Thank you @warkolm and @thn for your answers, they helped me and I have resolved the issue.
The problem was that I had too many shards in my cluster (approximately 800), I have reduced them and now my cluster works fine.

Topic		Replies	Views
Stuck with too many open files issue Elasticsearch	3	3829	October 23, 2019
Too many opened files Elasticsearch	15	7114	May 2, 2017
Too many open files even after increasing limit Elasticsearch	8	469	July 6, 2017
Error about too many open files when allocating shard Elasticsearch	6	1036	May 14, 2018
Repair index after too many open files error Elasticsearch	2	621	September 1, 2020

Elasticsearch - Too many open files

Related topics