over the weekend, i had a couple of problem elasticseach, and wasn't able to recover yet.
I only have one elasticsearch node, and there are two time-based indices (e.g. test-2015.09.01) feeding data in.
the first error elasticsearch hit is "too many files"
checked the elasticsearch startup file in /etc/init.d/ folder the max_file_open is set to 65535, but when i run curl -XGET 'http://localhost:9200/_nodes?process=true&pretty=true', the max_file_descriptor is 4096.
my questions are:
- what is the "too many files" error? why are there so many files open?
- How to decide a proper value for the MAX_OPEN_FILE settings?
- Is the MAX_FILE_OPEN and max_file_descriptor referring to the same thing? if yes, why the setting in the startup script doesn't take effect?
[test-2015.08.30][0] failed to execute bulk item (index) index {[test-2015.08.30]...
org.elasticsearch.index.engine.CreateFailedEngineException: [test-2015.08.30][0] Create failed for ...
at org.elasticsearch.index.engine.InternalEngine.create(InternalEngine.java:275)
at org.elasticsearch.index.shard.IndexShard.create(IndexShard.java:483)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:423)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:148)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: /var/data/elasticsearch/esearch02/nodes/0/indices/test-2015.08.30/0/index/_4u5.fdt (Too many open files)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.(FileOutputStream.java:221)
at java.io.FileOutputStream.(FileOutputStream.java:171)
at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:384)
at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:277)
at org.apache.lucene.store.FileSwitchDirectory.createOutput(FileSwitchDirectory.java:152)
at org.apache.lucene.store.RateLimitedFSDirectory.createOutput(RateLimitedFSDirectory.java:40)
at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:69)
at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:69)
at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:44)
at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113)
at org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:120)
at org.apache.lucene.index.DefaultIndexingChain.initStoredFieldsWriter(DefaultIndexingChain.java:83)
at org.apache.lucene.index.DefaultIndexingChain.startStoredFields(DefaultIndexingChain.java:270)
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:314)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:465)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1526)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1252)
at org.elasticsearch.index.engine.InternalEngine.innerCreateNoLock(InternalEngine.java:356)
at org.elasticsearch.index.engine.InternalEngine.innerCreate(InternalEngine.java:298)
at org.elasticsearch.index.engine.InternalEngine.create(InternalEngine.java:269)
I restarted the all services: redis, indexers, and elasticsearch, but seems it didn't recover:
elasticsearch log:
[2015-09-01 11:13:56,437][DEBUG][action.search.type ] [xxxx] All shards failed for phase: [query_fetch]
org.elasticsearch.action.NoShardAvailableActionException: [.marvel-2015.09.01][0] null
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:160)
at org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction.doExecute(TransportSearchQueryAndFetchAction.java:57)
at org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction.doExecute(TransportSearchQueryAndFetchAction.java:47)
...