Not showing logs from a host JavaException (Too many open files)


(prateek) #1

Hi Team,

I have implemented logstash using LogStash+Redis+ElasticSearch and Kibana

using kibana system is not showing logs from some hosts and this issue is coming very frequently. sometimes kibana is not showing logs of recent times at all from all hosts.

While debugging I seen some strange logs in elasticsearch log files. Which says that (Too many files open) kind of things .

Please find logs from /var/log/elasticsearch.log file

[2014-04-29 15:13:00,033][WARN ][cluster.action.shard ] [Whitemane, Aelfyre] [logstash-2014.04.20][1] sending failed shard for [logstash-2014.04.20][1], node[NTHTtK4DRIuCrm5RKgx30g], [P], s[INITIALIZING], indexUUID [silMCoFlSdWJf66yAgpybQ], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[logstash-2014.04.20][1] failed recovery]; nested: EngineCreationFailureException[[logstash-2014.04.20][1] failed to open reader on writer]; nested: FileNotFoundException[/usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.20/1/index/_f0r_es090_0.doc (Too many open files)]; ]]
[2014-04-29 15:13:00,033][WARN ][cluster.action.shard ] [Whitemane, Aelfyre] [logstash-2014.04.20][1] received shard failed for [logstash-2014.04.20][1], node[NTHTtK4DRIuCrm5RKgx30g], [P], s[INITIALIZING], indexUUID [silMCoFlSdWJf66yAgpybQ], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[logstash-2014.04.20][1] failed recovery]; nested: EngineCreationFailureException[[logstash-2014.04.20][1] failed to open reader on writer]; nested: FileNotFoundException[/usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.20/1/index/_f0r_es090_0.doc (Too many open files)]; ]]
[2014-04-29 15:13:00,039][WARN ][index.engine.robin ] [Whitemane, Aelfyre] [logstash-2014.04.29][1] shard is locked, releasing lock
[2014-04-29 15:13:00,039][WARN ][indices.cluster ] [Whitemane, Aelfyre] [logstash-2014.04.29][1] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [logstash-2014.04.29][1] failed recovery
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:232)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: [logstash-2014.04.29][1] failed to create engine
at org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:256)
at org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryPrepareForTranslog(InternalIndexShard.java:660)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:201)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:174)
... 3 more
Caused by: org.apache.lucene.store.LockReleaseFailedException: Cannot forcefully unlock a NativeFSLock which is held by another indexer component: /usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.29/1/index/write.lock
at org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:295)
at org.apache.lucene.index.IndexWriter.unlock(IndexWriter.java:4458)
at org.elasticsearch.index.engine.robin.RobinEngine.createWriter(RobinEngine.java:1415)
at org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:254)
... 6 more
[2014-04-29 15:13:00,041][WARN ][cluster.action.shard ] [Whitemane, Aelfyre] [logstash-2014.04.29][1] sending failed shard for [logstash-2014.04.29][1], node[NTHTtK4DRIuCrm5RKgx30g], [P], s[INITIALIZING], indexUUID [W2ZbxZCXQYecXw8Jjrabhg], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[logstash-2014.04.29][1] failed recovery]; nested: EngineCreationFailureException[[logstash-2014.04.29][1] failed to create engine]; nested: LockReleaseFailedException[Cannot forcefully unlock a NativeFSLock which is held by another indexer component: /usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.29/1/index/write.lock]; ]]
[2014-04-29 15:13:00,041][WARN ][cluster.action.shard ] [Whitemane, Aelfyre] [logstash-2014.04.29][1] received shard failed for [logstash-2014.04.29][1], node[NTHTtK4DRIuCrm5RKgx30g], [P], s[INITIALIZING], indexUUID [W2ZbxZCXQYecXw8Jjrabhg], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[logstash-2014.04.29][1] failed recovery]; nested: EngineCreationFailureException[[logstash-2014.04.29][1] failed to create engine]; nested: LockReleaseFailedException[Cannot forcefully unlock a NativeFSLock which is held by another indexer component: /usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.29/1/index/write.lock]; ]]
[2014-04-29 15:13:00,052][WARN ][indices.cluster ] [Whitemane, Aelfyre] [logstash-2014.04.20][1] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [logstash-2014.04.20][1] failed recovery
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:232)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: [logstash-2014.04.20][1] failed to open reader on writer
at org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:287)
at org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryPrepareForTranslog(InternalIndexShard.java:660)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:201)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:174)
... 3 more
Caused by: java.io.FileNotFoundException: /usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.20/1/index/_f0r_es090_0.doc (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:388)
at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:127)
at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:80)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
at org.elasticsearch.index.store.Store$StoreDirectory.openInput(Store.java:471)
at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.(Lucene41PostingsReader.java:72)
at org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:430)
at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat$BloomFilteredFieldsProducer.(BloomFilterPostingsFormat.java:131)
at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat.fieldsProducer(BloomFilterPostingsFormat.java:102)
at org.elasticsearch.index.codec.postingsformat.ElasticSearch090PostingsFormat.fieldsProducer(ElasticSearch090PostingsFormat.java:79)
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:195)
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:244)
at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:115)
at org.apache.lucene.index.SegmentReader.(SegmentReader.java:95)
at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:141)
at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:235)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:100)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:382)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:111)
at org.apache.lucene.search.SearcherManager.(SearcherManager.java:89)
at org.elasticsearch.index.engine.robin.RobinEngine.buildSearchManager(RobinEngine.java:1530)
at org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:277)
... 6 more
[2014-04-29 15:13:00,055][WARN ][cluster.action.shard ] [Whitemane, Aelfyre] [logstash-2014.04.20][1] sending failed shard for [logstash-2014.04.20][1], node[NTHTtK4DRIuCrm5RKgx30g], [P], s[INITIALIZING], indexUUID [silMCoFlSdWJf66yAgpybQ], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[logstash-2014.04.20][1] failed recovery]; nested: EngineCreationFailureException[[logstash-2014.04.20][1] failed to open reader on writer]; nested: FileNotFoundException[/usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.20/1/index/_f0r_es090_0.doc (Too many open files)]; ]]
[2014-04-29 15:13:00,055][WARN ][cluster.action.shard ] [Whitemane, Aelfyre] [logstash-2014.04.20][1] received shard failed for [logstash-2014.04.20][1], node[NTHTtK4DRIuCrm5RKgx30g], [P], s[INITIALIZING], indexUUID [silMCoFlSdWJf66yAgpybQ], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[logstash-2014.04.20][1] failed recovery]; nested: EngineCreationFailureException[[logstash-2014.04.20][1] failed to open reader on writer]; nested: FileNotFoundException[/usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.20/1/index/_f0r_es090_0.doc (Too many open files)]; ]]
[2014-04-29 15:13:00,060][WARN ][index.engine.robin ] [Whitemane, Aelfyre] [logstash-2014.04.29][1] shard is locked, releasing lock
[2014-04-29 15:13:00,060][WARN ][indices.cluster ] [Whitemane, Aelfyre] [logstash-2014.04.29][1] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [logstash-2014.04.29][1] failed recovery
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:232)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: [logstash-2014.04.29][1] failed to create engine
at org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:256)
at org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryPrepareForTranslog(InternalIndexShard.java:660)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:201)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:174)
... 3 more
Caused by: org.apache.lucene.store.LockReleaseFailedException: Cannot forcefully unlock a NativeFSLock which is held by another indexer component: /usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.29/1/index/write.lock
at org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:295)
at org.apache.lucene.index.IndexWriter.unlock(IndexWriter.java:4458)
at org.elasticsearch.index.engine.robin.RobinEngine.createWriter(RobinEngine.java:1415)
at org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:254)
... 6 more
[2014-04-29 15:13:00,063][WARN ][cluster.action.shard ] [Whitemane, Aelfyre] [logstash-2014.04.29][1] sending failed shard for [logstash-2014.04.29][1], node[NTHTtK4DRIuCrm5RKgx30g], [P], s[INITIALIZING], indexUUID [W2ZbxZCXQYecXw8Jjrabhg], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[logstash-2014.04.29][1] failed recovery]; nested: EngineCreationFailureException[[logstash-2014.04.29][1] failed to create engine]; nested: LockReleaseFailedException[Cannot forcefully unlock a NativeFSLock which is held by another indexer component: /usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.29/1/index/write.lock]; ]]
[2014-04-29 15:13:00,063][WARN ][cluster.action.shard ] [Whitemane, Aelfyre] [logstash-2014.04.29][1] received shard failed for [logstash-2014.04.29][1], node[NTHTtK4DRIuCrm5RKgx30g], [P], s[INITIALIZING], indexUUID [W2ZbxZCXQYecXw8Jjrabhg], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[logstash-2014.04.29][1] failed recovery]; nested: EngineCreationFailureException[[logstash-2014.04.29][1] failed to create engine]; nested: LockReleaseFailedException[Cannot forcefully unlock a NativeFSLock which is held by another indexer component: /usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.29/1/index/write.lock]; ]]
[2014-04-29 15:13:00,074][WARN ][indices.cluster ] [Whitemane, Aelfyre] [logstash-2014.04.20][1] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [logstash-2014.04.20][1] failed recovery
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:232)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: [logstash-2014.04.20][1] failed to open reader on writer
at org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:287)
at org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryPrepareForTranslog(InternalIndexShard.java:660)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:201)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:174)
... 3 more
Caused by: java.io.FileNotFoundException: /usr/local/elasticsearch-0.90.9/data/elasticsearch/nodes/0/indices/logstash-2014.04.20/1/index/_f0r_es090_0.doc (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:388)
at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:127)
at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:80)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
at org.elasticsearch.index.store.Store$StoreDirectory.openInput(Store.java:471)
at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.(Lucene41PostingsReader.java:72)
at org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:430)
at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat$BloomFilteredFieldsProducer.(BloomFilterPostingsFormat.java:131)
at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat.fieldsProducer(BloomFilterPostingsFormat.java:102)
at org.elasticsearch.index.codec.postingsformat.ElasticSearch090PostingsFormat.fieldsProducer(ElasticSearch090PostingsFormat.java:79)
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:195)
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:244)
at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:115)
at org.apache.lucene.index.SegmentReader.(SegmentReader.java:95)
at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:141)
at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:235)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:100)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:382)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:111)
at org.apache.lucene.search.SearcherManager.(SearcherManager.java:89)
at org.elasticsearch.index.engine.robin.RobinEngine.buildSearchManager(RobinEngine.java:1530)
at org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:277)
... 6 more

I have added these values in /etc/security/limit.conf file:

root - memlock unlimited
root soft nofile 800000
root hard nofile 1000000

and this chunk of code in logstash.in.sh script:

if [ "x$MAX_OPEN_FILES" != "x" ]; then
MAX_OPEN_FILES=100000

fi

logs are coming from agent but i think elasticsearch is not able to index them properly.

Please help me in resolving this issue. This is urgent.

Thanks & Regards,


(system) #2