ElasticSearch giving FileNotFoundException: (Too many open files)


(prateek) #1

Hi Team,

I have implemented logstash using LogStash+Redis+ElasticSearch and Kibana

using kibana system is not showing logs from some hosts and this issue is
coming very frequently. sometimes kibana is not showing logs of recent
times at all from all hosts.

While debugging I seen some strange logs in elasticsearch log files. Which
says that (Too many files open) kind of things .

Please find logs from /var/log/elasticsearch.log file

[2014-04-29 15:13:00,033][WARN ][cluster.action.shard ] [Whitemane,
Aelfyre] [logstash-2014.04.20][1] sending failed shard for
[logstash-2014.04.20][1], node[NTHTtK4DRIuCrm5RKgx30g], [P],
s[INITIALIZING], indexUUID [silMCoFlSdWJf66yAgpybQ], reason [Failed to
start shard, message
[IndexShardGatewayRecoveryException[[logstash-2014.04.20][1] failed
recovery]; nested: EngineCreationFailureException[[logstash-2014.04.20][1]
failed to open reader on writer]; nested:
FileNotFoundException[/usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.20/1/index/_f0r_es090_0.doc
(Too many open files)]; ]]

[2014-04-29 15:13:00,033][WARN ][cluster.action.shard ] [Whitemane,
Aelfyre] [logstash-2014.04.20][1] received shard failed for
[logstash-2014.04.20][1], node[NTHTtK4DRIuCrm5RKgx30g], [P],
s[INITIALIZING], indexUUID [silMCoFlSdWJf66yAgpybQ], reason [Failed to
start shard, message
[IndexShardGatewayRecoveryException[[logstash-2014.04.20][1] failed
recovery]; nested: EngineCreationFailureException[[logstash-2014.04.20][1]
failed to open reader on writer]; nested:
FileNotFoundException[/usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.20/1/index/_f0r_es090_0.doc
(Too many open files)]; ]]

[2014-04-29 15:13:00,039][WARN ][index.engine.robin ] [Whitemane,
Aelfyre] [logstash-2014.04.29][1] shard is locked, releasing lock

[2014-04-29 15:13:00,039][WARN ][indices.cluster ] [Whitemane,
Aelfyre] [logstash-2014.04.29][1] failed to start shard

org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[logstash-2014.04.29][1] failed recovery

  • at
    org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:232)*
  • at
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)*
  • at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)*
  • at java.lang.Thread.run(Thread.java:722)*
    Caused by: org.elasticsearch.index.engine.EngineCreationFailureException:
    [logstash-2014.04.29][1] failed to create engine
  • at
    org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:256)*
  • at
    org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryPrepareForTranslog(InternalIndexShard.java:660)*
  • at
    org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:201)*
  • at
    org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:174)*
  • ... 3 more*
    Caused by: org.apache.lucene.store.LockReleaseFailedException: Cannot
    forcefully unlock a NativeFSLock which is held by another indexer
    component:
    /usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.29/1/index/write.lock
  • at
    org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:295)*
  • at org.apache.lucene.index.IndexWriter.unlock(IndexWriter.java:4458)*
  • at
    org.elasticsearch.index.engine.robin.RobinEngine.createWriter(RobinEngine.java:1415)*
  • at
    org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:254)*
  • ... 6 more*
    [2014-04-29 15:13:00,041][WARN ][cluster.action.shard ] [Whitemane,
    Aelfyre] [logstash-2014.04.29][1] sending failed shard for
    [logstash-2014.04.29][1], node[NTHTtK4DRIuCrm5RKgx30g], [P],
    s[INITIALIZING], indexUUID [W2ZbxZCXQYecXw8Jjrabhg], reason [Failed to
    start shard, message
    [IndexShardGatewayRecoveryException[[logstash-2014.04.29][1] failed
    recovery]; nested: EngineCreationFailureException[[logstash-2014.04.29][1]
    failed to create engine]; nested: LockReleaseFailedException[Cannot
    forcefully unlock a NativeFSLock which is held by another indexer
    component:
    /usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.29/1/index/write.lock];
    ]]

    [2014-04-29 15:13:00,041][WARN ][cluster.action.shard ] [Whitemane,
    Aelfyre] [logstash-2014.04.29][1] received shard failed for
    [logstash-2014.04.29][1], node[NTHTtK4DRIuCrm5RKgx30g], [P],
    s[INITIALIZING], indexUUID [W2ZbxZCXQYecXw8Jjrabhg], reason [Failed to
    start shard, message
    [IndexShardGatewayRecoveryException[[logstash-2014.04.29][1] failed
    recovery]; nested: EngineCreationFailureException[[logstash-2014.04.29][1]
    failed to create engine]; nested: LockReleaseFailedException[Cannot
    forcefully unlock a NativeFSLock which is held by another indexer
    component:
    /usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.29/1/index/write.lock];
    ]]

    [2014-04-29 15:13:00,052][WARN ][indices.cluster ] [Whitemane,
    Aelfyre] [logstash-2014.04.20][1] failed to start shard

    org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
    [logstash-2014.04.20][1] failed recovery
  • at
    org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:232)*
  • at
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)*
  • at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)*
  • at java.lang.Thread.run(Thread.java:722)*
    Caused by: org.elasticsearch.index.engine.EngineCreationFailureException:
    [logstash-2014.04.20][1] failed to open reader on writer
  • at
    org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:287)*
  • at
    org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryPrepareForTranslog(InternalIndexShard.java:660)*
  • at
    org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:201)*
  • at
    org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:174)*
  • ... 3 more*
    Caused by: java.io.FileNotFoundException:
    /usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.20/1/index/_f0r_es090_0.doc
    (Too many open files)
  • at java.io.RandomAccessFile.open(Native Method)*
  • at java.io.RandomAccessFile.(RandomAccessFile.java:233)*
  • at
    org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:388)*
  • at
    org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:127)*
  • at
    org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:80)*
  • at
    org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)*
  • at
    org.elasticsearch.index.store.Store$StoreDirectory.openInput(Store.java:471)*
  • at
    org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.(Lucene41PostingsReader.java:72)*
  • at
    org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:430)*
  • at
    org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat$BloomFilteredFieldsProducer.(BloomFilterPostingsFormat.java:131)*
  • at
    org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat.fieldsProducer(BloomFilterPostingsFormat.java:102)*
  • at
    org.elasticsearch.index.codec.postingsformat.ElasticSearch090PostingsFormat.fieldsProducer(ElasticSearch090PostingsFormat.java:79)*
  • at
    org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:195)*
  • at
    org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:244)*
  • at
    org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:115)*
  • at org.apache.lucene.index.SegmentReader.(SegmentReader.java:95)*
  • at
    org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:141)*
  • at
    org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:235)*
  • at
    org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:100)*
  • at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:382)*
  • at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:111)*
  • at
    org.apache.lucene.search.SearcherManager.(SearcherManager.java:89)*
  • at
    org.elasticsearch.index.engine.robin.RobinEngine.buildSearchManager(RobinEngine.java:1530)*
  • at
    org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:277)*
  • ... 6 more*
    [2014-04-29 15:13:00,055][WARN ][cluster.action.shard ] [Whitemane,
    Aelfyre] [logstash-2014.04.20][1] sending failed shard for
    [logstash-2014.04.20][1], node[NTHTtK4DRIuCrm5RKgx30g], [P],
    s[INITIALIZING], indexUUID [silMCoFlSdWJf66yAgpybQ], reason [Failed to
    start shard, message
    [IndexShardGatewayRecoveryException[[logstash-2014.04.20][1] failed
    recovery]; nested: EngineCreationFailureException[[logstash-2014.04.20][1]
    failed to open reader on writer]; nested:
    FileNotFoundException[/usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.20/1/index/_f0r_es090_0.doc
    (Too many open files)]; ]]

    [2014-04-29 15:13:00,055][WARN ][cluster.action.shard ] [Whitemane,
    Aelfyre] [logstash-2014.04.20][1] received shard failed for
    [logstash-2014.04.20][1], node[NTHTtK4DRIuCrm5RKgx30g], [P],
    s[INITIALIZING], indexUUID [silMCoFlSdWJf66yAgpybQ], reason [Failed to
    start shard, message
    [IndexShardGatewayRecoveryException[[logstash-2014.04.20][1] failed
    recovery]; nested: EngineCreationFailureException[[logstash-2014.04.20][1]
    failed to open reader on writer]; nested:
    FileNotFoundException[/usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.20/1/index/_f0r_es090_0.doc
    (Too many open files)]; ]]

    [2014-04-29 15:13:00,060][WARN ][index.engine.robin ] [Whitemane,
    Aelfyre] [logstash-2014.04.29][1] shard is locked, releasing lock

    [2014-04-29 15:13:00,060][WARN ][indices.cluster ] [Whitemane,
    Aelfyre] [logstash-2014.04.29][1] failed to start shard

    org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
    [logstash-2014.04.29][1] failed recovery
  • at
    org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:232)*
  • at
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)*
  • at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)*
  • at java.lang.Thread.run(Thread.java:722)*
    Caused by: org.elasticsearch.index.engine.EngineCreationFailureException:
    [logstash-2014.04.29][1] failed to create engine
  • at
    org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:256)*
  • at
    org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryPrepareForTranslog(InternalIndexShard.java:660)*
  • at
    org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:201)*
  • at
    org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:174)*
  • ... 3 more*
    Caused by: org.apache.lucene.store.LockReleaseFailedException: Cannot
    forcefully unlock a NativeFSLock which is held by another indexer
    component:
    /usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.29/1/index/write.lock
  • at
    org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:295)*
  • at org.apache.lucene.index.IndexWriter.unlock(IndexWriter.java:4458)*
  • at
    org.elasticsearch.index.engine.robin.RobinEngine.createWriter(RobinEngine.java:1415)*
  • at
    org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:254)*
  • ... 6 more*
    [2014-04-29 15:13:00,063][WARN ][cluster.action.shard ] [Whitemane,
    Aelfyre] [logstash-2014.04.29][1] sending failed shard for
    [logstash-2014.04.29][1], node[NTHTtK4DRIuCrm5RKgx30g], [P],
    s[INITIALIZING], indexUUID [W2ZbxZCXQYecXw8Jjrabhg], reason [Failed to
    start shard, message
    [IndexShardGatewayRecoveryException[[logstash-2014.04.29][1] failed
    recovery]; nested: EngineCreationFailureException[[logstash-2014.04.29][1]
    failed to create engine]; nested: LockReleaseFailedException[Cannot
    forcefully unlock a NativeFSLock which is held by another indexer
    component:
    /usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.29/1/index/write.lock];
    ]]

    [2014-04-29 15:13:00,063][WARN ][cluster.action.shard ] [Whitemane,
    Aelfyre] [logstash-2014.04.29][1] received shard failed for
    [logstash-2014.04.29][1], node[NTHTtK4DRIuCrm5RKgx30g], [P],
    s[INITIALIZING], indexUUID [W2ZbxZCXQYecXw8Jjrabhg], reason [Failed to
    start shard, message
    [IndexShardGatewayRecoveryException[[logstash-2014.04.29][1] failed
    recovery]; nested: EngineCreationFailureException[[logstash-2014.04.29][1]
    failed to create engine]; nested: LockReleaseFailedException[Cannot
    forcefully unlock a NativeFSLock which is held by another indexer
    component:
    /usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.29/1/index/write.lock];
    ]]

    [2014-04-29 15:13:00,074][WARN ][indices.cluster ] [Whitemane,
    Aelfyre] [logstash-2014.04.20][1] failed to start shard

    org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
    [logstash-2014.04.20][1] failed recovery
  • at
    org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:232)*
  • at
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)*
  • at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)*
  • at java.lang.Thread.run(Thread.java:722)*
    Caused by: org.elasticsearch.index.engine.EngineCreationFailureException:
    [logstash-2014.04.20][1] failed to open reader on writer
  • at
    org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:287)*
  • at
    org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryPrepareForTranslog(InternalIndexShard.java:660)*
  • at
    org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:201)*
  • at
    org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:174)*
  • ... 3 more*
    Caused by: java.io.FileNotFoundException:
    /usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.20/1/index/_f0r_es090_0.doc
    (Too many open files)
  • at java.io.RandomAccessFile.open(Native Method)*
  • at java.io.RandomAccessFile.(RandomAccessFile.java:233)*
  • at
    org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:388)*
  • at
    org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:127)*
  • at
    org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:80)*
  • at
    org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)*
  • at
    org.elasticsearch.index.store.Store$StoreDirectory.openInput(Store.java:471)*
  • at
    org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.(Lucene41PostingsReader.java:72)*
  • at
    org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:430)*
  • at
    org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat$BloomFilteredFieldsProducer.(BloomFilterPostingsFormat.java:131)*
  • at
    org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat.fieldsProducer(BloomFilterPostingsFormat.java:102)*
  • at
    org.elasticsearch.index.codec.postingsformat.ElasticSearch090PostingsFormat.fieldsProducer(ElasticSearch090PostingsFormat.java:79)*
  • at
    org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:195)*
  • at
    org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:244)*
  • at
    org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:115)*
  • at org.apache.lucene.index.SegmentReader.(SegmentReader.java:95)*
  • at
    org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:141)*
  • at
    org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:235)*
  • at
    org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:100)*
  • at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:382)*
  • at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:111)*
  • at
    org.apache.lucene.search.SearcherManager.(SearcherManager.java:89)*
  • at
    org.elasticsearch.index.engine.robin.RobinEngine.buildSearchManager(RobinEngine.java:1530)*
  • at
    org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:277)*
  • ... 6 more*

I have added these values in /etc/security/limit.conf file:

root - memlock unlimited
*root soft nofile 800000 *
root hard nofile 1000000

and this chunk of code in logstash.in.sh script:

if [ "x$MAX_OPEN_FILES" != "x" ]; then

  • MAX_OPEN_FILES=100000*

fi

logs are coming from agent but i think elasticsearch is not able to index
them properly.

Please help me in resolving this issue. This is urgent.

Thanks & Regards,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4b282321-45b8-4f7b-880b-3f0bce2013d0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Brian Yoder) #2

Prateek ,

I've collected this from various sources and put it all together. Works
fine for me, though I haven't yet dived into ELK:


You may verify the current soft limit by logging into the user that runs
the Elastisearch JVM and issuing the following command:

$ ulimit -Sn

Finally, verify that Elasticsearch is indeed able to open up to this number
of file handles from the max_file_descriptors value for each node via the
_nodes API:

$ curl localhost:9200/_nodes/process?pretty && echo

ON LINUX

Update the /etc/security/limits.conf file and ensure that it contains the
following two lines and all is well again:

username hard nofile 65537
username soft nofile 65536

Of course, replace the 'username' user name to reflect the user on your
own machines that are running Elasticsearch.

ON SOLARIS

In Solaris 9, the default limit of file descriptors per process was raised
from 1024 to 65536.

ON MAC OS X

Create or edit /etc/launchd.conf and add the following line:

limit maxfiles 400000 400000

Then shutdown OS X and restart the Mac. Verify the settings by opening a
new terminal, and running either or both commands below:

$ launchctl limit maxfiles
$ ulimit -a

You should see the maxfiles set to 400000 from the output of both of those
commands.


Note that this is more of a Unix thing than an Elasticsearch thing. So if
you are still having issues you may wish to ask on a newsgroup that
specifically targets your operating system.

Also note that it's not a good practice to run an application as root. Too
much chance to wipe out something from which you could never recover, and
all that. I remember than once our operations folks started ES as root, and
then after that the data files were owned by root and the non-root ES user
now had troubles starting with locking errors all over the logs. I ended up
performing a recursive chown to the ES filesystem and when restarted as the
non-root user all was well again.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ddf6f61e-8a4e-4c6d-b09b-54361805cdf7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3