Elasticsearch problems

zpp · September 1, 2015, 6:02am

over the weekend, i had a couple of problem elasticseach, and wasn't able to recover yet.
I only have one elasticsearch node, and there are two time-based indices (e.g. test-2015.09.01) feeding data in.

the first error elasticsearch hit is "too many files"
checked the elasticsearch startup file in /etc/init.d/ folder the max_file_open is set to 65535, but when i run curl -XGET 'http://localhost:9200/_nodes?process=true&pretty=true', the max_file_descriptor is 4096.
my questions are:

what is the "too many files" error? why are there so many files open?
How to decide a proper value for the MAX_OPEN_FILE settings?
Is the MAX_FILE_OPEN and max_file_descriptor referring to the same thing? if yes, why the setting in the startup script doesn't take effect?

[test-2015.08.30][0] failed to execute bulk item (index) index {[test-2015.08.30]...
org.elasticsearch.index.engine.CreateFailedEngineException: [test-2015.08.30][0] Create failed for ...
at org.elasticsearch.index.engine.InternalEngine.create(InternalEngine.java:275)
at org.elasticsearch.index.shard.IndexShard.create(IndexShard.java:483)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:423)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:148)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: /var/data/elasticsearch/esearch02/nodes/0/indices/test-2015.08.30/0/index/_4u5.fdt (Too many open files)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.(FileOutputStream.java:221)
at java.io.FileOutputStream.(FileOutputStream.java:171)
at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:384)
at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:277)
at org.apache.lucene.store.FileSwitchDirectory.createOutput(FileSwitchDirectory.java:152)
at org.apache.lucene.store.RateLimitedFSDirectory.createOutput(RateLimitedFSDirectory.java:40)
at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:69)
at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:69)
at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:44)
at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:113)
at org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:120)
at org.apache.lucene.index.DefaultIndexingChain.initStoredFieldsWriter(DefaultIndexingChain.java:83)
at org.apache.lucene.index.DefaultIndexingChain.startStoredFields(DefaultIndexingChain.java:270)
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:314)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:465)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1526)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1252)
at org.elasticsearch.index.engine.InternalEngine.innerCreateNoLock(InternalEngine.java:356)
at org.elasticsearch.index.engine.InternalEngine.innerCreate(InternalEngine.java:298)
at org.elasticsearch.index.engine.InternalEngine.create(InternalEngine.java:269)

I restarted the all services: redis, indexers, and elasticsearch, but seems it didn't recover:
elasticsearch log:
[2015-09-01 11:13:56,437][DEBUG][action.search.type ] [xxxx] All shards failed for phase: [query_fetch]
org.elasticsearch.action.NoShardAvailableActionException: [.marvel-2015.09.01][0] null
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:160)
at org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction.doExecute(TransportSearchQueryAndFetchAction.java:57)
at org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction.doExecute(TransportSearchQueryAndFetchAction.java:47)
...

dadoonet · September 1, 2015, 6:15am

Most likely because you have too many shards per node.

Try to check that as well (at first?)...

warkolm · September 1, 2015, 6:18am

You don't want to set ulimit in that file.

Take a look at this chapter of the Definitive Guide - https://www.elastic.co/guide/en/elasticsearch/guide/current/deploy.html

zpp · September 1, 2015, 9:40am

hmm.. i actually never bothered about the shards...
I have 86 shards:
{"cluster_name":"esearch02","status":"yellow","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":86,"active_shards":86,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":86,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0}

And from head:
cluster health: yellow (86 of 172)

is there a way to control the number of shards? and what's the optimal setting?

warkolm · September 1, 2015, 11:24am

You set shard count when you create the index, you can control that via a template or other methods.

There's no optimal setting, it depends on your use case.

zpp · September 2, 2015, 6:36am

I used all the default setting, i.e. 5 primary shards per index and 1 replica, looks like a lot to do to decide the settings...
Back to my original question, could you help to explain a little bit more about this "too many files" problem? what is this error? why are there so many files open? Thank you.

warkolm · September 2, 2015, 7:21am

Because the underlying lucene engine creates a bunch of files to manage the shard, plus factor in open ports and what not and it adds up quickly.

zpp · September 2, 2015, 11:07am

sounds complicated, thank you very much!

amits · June 13, 2017, 1:21pm

hi @zpp

I am also getting the same problem in my ES.
Can you please tell me how to increase the no. of shards in order to get rid of this noshardavailableactionexception ERROR.

thanks in advance.
amits

warkolm · June 13, 2017, 10:54pm

Please start another thread for your question.

Topic		Replies	Views
Too many open files Elasticsearch	10	9403	July 6, 2017
Too many open files error logged Elasticsearch	5	4417	July 5, 2017
Too many files open Elasticsearch	6	1237	July 6, 2017
Too Many Open Files - Already set max files Elasticsearch	5	4309	July 5, 2017
Too Many Open Files Elasticsearch	4	1617	July 6, 2017

Elasticsearch problems

Related topics