Elasticsearch - Too many open files

Hello Elasticusers,

I have an issue with Elasticsearch, to begin this is my Elasticsearch cluster configuration :

  • 3 nodes with 8Gb of RAM
  • Mounted disk to store data, logs, snapshots... (400Gb)

For the data, I have currently:

  • 135 indices
  • 806 shards
  • 80 000 000 of documents
  • 44 Gb of data

My cluster was green and I didn't have issue, however since today I have noticed that I can't create new indices, my cluster becomes red and the shard allocation/creation/replication's processes lose their minds.

This is an example of my log file :

[2016-03-14 09:38:44,295][DEBUG][action.admin.indices.stats] [Cluster 2.2.0 ES1] [indices:monitor/stats] failed to execute operation for shard [[my_index][2], node[eOINqJryQ8yH9Pbg9S5Kag], [P], v[25], s[STARTED], a[id=addzc0pdTLGJVgEXAluIAg]]
ElasticsearchException[failed to refresh store stats]; nested: FileSystemException[/mnt/mounted_disk/my_index/2/index: Too many open files];
at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1534)
at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1519)
at org.elasticsearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:55)
at org.elasticsearch.index.store.Store.stats(Store.java:293)
at org.elasticsearch.index.shard.IndexShard.storeStats(IndexShard.java:665)
at org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:134)
at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:165)
at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:47)
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:409)
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:388)
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:375)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:350)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.file.FileSystemException: /mnt/mounted_disk/my_index/2/index: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
at java.nio.file.Files.newDirectoryStream(Files.java:457)
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:190)
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:202)
at org.elasticsearch.index.store.FsDirectoryService$1.listAll(FsDirectoryService.java:127)
at org.apache.lucene.store.FilterDirectory.listAll(FilterDirectory.java:57)
at org.apache.lucene.store.FilterDirectory.listAll(FilterDirectory.java:57)
at org.elasticsearch.index.store.Store$StoreStatsCache.estimateSize(Store.java:1540)
at org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1532)
... 15 more

I don't know what is the issue...
Maybe there are too many shards? Too many indices?

I have also noticed that the cached memory increases constantly and never decreases :

At the beginning of the indexation :

During the indexation :

At the end :

Moreover, this command doesn't clear the cache memory :

Is it possible that is a cache issue? Why the cache memory never decreases?

Hope you could help me :slight_smile:

I'm assuming you already checked out "ulimit" settings in Linux.

In theory, each shard is equivalent to a Lucene index and it can hold up to 2B documents. Based on the numbers you gave above, with 80M documents, you should use less indices and shards to give the system room to breath. I suggest, you take a look at your business requirements, how you want data indexed and searched, how often you need to delete old data if any, etc... then do your best to estimate how many indices and shards you should have? Hopefully, you can find the answer that is applicable to your data and business requirements.

If you truly need to have that many indices and shards, you'll need more data nodes to spread them out.

In my own scenario, I have similar hardware with a little more system RAM (32GB) but the HEAP_SIZE is configured at 8GB each. I have three indices (each has 5 shards, no replicas), each index has 1B documents and I don't have the problem you have. I don't do replicas in this case b/c I'm testing the system.

Check out this page of the Guide, it should help https://www.elastic.co/guide/en/elasticsearch/guide/master/_file_descriptors_and_mmap.html

Thank you @warkolm and @thn for your answers, they helped me and I have resolved the issue.
The problem was that I had too many shards in my cluster (approximately 800), I have reduced them and now my cluster works fine. :wink: