Elasticsearch problems

over the weekend, i had a couple of problem elasticseach, and wasn't able to recover yet.
I only have one elasticsearch node, and there are two time-based indices (e.g. test-2015.09.01) feeding data in.

the first error elasticsearch hit is "too many files"
checked the elasticsearch startup file in /etc/init.d/ folder the max_file_open is set to 65535, but when i run curl -XGET 'http://localhost:9200/_nodes?process=true&pretty=true', the max_file_descriptor is 4096.
my questions are:

  1. what is the "too many files" error? why are there so many files open?
  2. How to decide a proper value for the MAX_OPEN_FILE settings?
  3. Is the MAX_FILE_OPEN and max_file_descriptor referring to the same thing? if yes, why the setting in the startup script doesn't take effect?

[test-2015.08.30][0] failed to execute bulk item (index) index {[test-2015.08.30]...
org.elasticsearch.index.engine.CreateFailedEngineException: [test-2015.08.30][0] Create failed for ...
at org.elasticsearch.index.engine.InternalEngine.create(InternalEngine.java:275)
at org.elasticsearch.index.shard.IndexShard.create(IndexShard.java:483)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:423)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:148)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: /var/data/elasticsearch/esearch02/nodes/0/indices/test-2015.08.30/0/index/_4u5.fdt (Too many open files)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.(FileOutputStream.java:221)
at java.io.FileOutputStream.(FileOutputStream.java:171)
at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:384)
at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:277)
at org.apache.lucene.store.FileSwitchDirectory.createOutput(FileSwitchDirectory.java:152)
at org.apache.lucene.store.RateLimitedFSDirectory.createOutput(RateLimitedFSDirectory.java:40)
at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:69)
at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:69)
at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:44)
at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113)
at org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:120)
at org.apache.lucene.index.DefaultIndexingChain.initStoredFieldsWriter(DefaultIndexingChain.java:83)
at org.apache.lucene.index.DefaultIndexingChain.startStoredFields(DefaultIndexingChain.java:270)
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:314)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:465)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1526)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1252)
at org.elasticsearch.index.engine.InternalEngine.innerCreateNoLock(InternalEngine.java:356)
at org.elasticsearch.index.engine.InternalEngine.innerCreate(InternalEngine.java:298)
at org.elasticsearch.index.engine.InternalEngine.create(InternalEngine.java:269)

I restarted the all services: redis, indexers, and elasticsearch, but seems it didn't recover:
elasticsearch log:
[2015-09-01 11:13:56,437][DEBUG][action.search.type ] [xxxx] All shards failed for phase: [query_fetch]
org.elasticsearch.action.NoShardAvailableActionException: [.marvel-2015.09.01][0] null
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:160)
at org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction.doExecute(TransportSearchQueryAndFetchAction.java:57)
at org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction.doExecute(TransportSearchQueryAndFetchAction.java:47)
...

Most likely because you have too many shards per node.

Try to check that as well (at first?)...

You don't want to set ulimit in that file.

Take a look at this chapter of the Definitive Guide - https://www.elastic.co/guide/en/elasticsearch/guide/current/deploy.html

hmm.. i actually never bothered about the shards...
I have 86 shards:
{"cluster_name":"esearch02","status":"yellow","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":86,"active_shards":86,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":86,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0}

And from head:
cluster health: yellow (86 of 172)

is there a way to control the number of shards? and what's the optimal setting?

You set shard count when you create the index, you can control that via a template or other methods.

There's no optimal setting, it depends on your use case.

I used all the default setting, i.e. 5 primary shards per index and 1 replica, looks like a lot to do to decide the settings...
Back to my original question, could you help to explain a little bit more about this "too many files" problem? what is this error? why are there so many files open? Thank you.

Because the underlying lucene engine creates a bunch of files to manage the shard, plus factor in open ports and what not and it adds up quickly.

sounds complicated, thank you very much!

hi @zpp

I am also getting the same problem in my ES.
Can you please tell me how to increase the no. of shards in order to get rid of this noshardavailableactionexception ERROR.

thanks in advance.
amits

Please start another thread for your question.