Https://discuss.elastic.co/c/elasticsearch


(supermario) #1

Hi all,
I have two node for elasticsearch cluster.I found the cluster health is red . i tried to delete the error index, but the health is still red(i have cleaned all indices). Besides i found logstash has a exception "outofmemory" from output(logstash output to ealstic). I have increased the heap size,currently working normally.
I don't know whether abnormal influence the value of health.So, I want to know how to solve the problem of health value. Thanks all.


(Magnus Bäck) #2

A red cluster health means that one or more shards are unavailable (e.g. because the node(s) they were on are down or because of data corruption preventing them from being loaded). Dropping the indexes with incomplete shards should fix the situation and bring the cluster back to green (or at least yellow). If you can't drop the indexes and lose the data you'd have to reindex the existing data to a new index (which would contain the healthy documents from the partial original index) and then drop the red index.

Your ES log file probably contains clues about what has been going on. If you can't interpret the logs maybe we can.

Next time please, please pick a proper subject for your topic instead of "Https://discuss.elastic.co/c/elasticsearch".


(supermario) #3

Hi,magnus baeck. Thank you your reply.Due to the log file content too much, The contents of the log is today . so I copied the part of the error message.

  1. org.elasticsearch.index.engine.RefreshFailedEngineException: [logstash-iis-2015.10.20][0] Refresh failed
    at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:575) at org.elasticsearch.index.shard.IndexShard.refresh(IndexShard.java:595)
    at java.lang.Thread.run(Thread.java:745)
    Caused by: java.io.FileNotFoundException: /usr/local/elasticsearch-1.7.1/data/elasticsearch/nodes/0/indices/logstash-iis-2015.10.20/0/index/_1wm_Lucene410_0.dvm (Too many open files)
    at java.io.FileOutputStream.open0(Native Method)

  2. [2015-10-20 14:10:39,305][WARN ][netty.channel.socket.nio.AbstractNioSelector] Failed to accept a connection.
    java.io.IOException: Too many open files
    at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)

  3. [2015-10-20 14:06:33,738][WARN ][cluster.action.shard ] [Green Goblin] [logstash-iis-2015.10.20][3] received shard failed for [logstash-iis-2015.10.20][3], node[Pw6wO6y6RLCWKCV1jNpcWA], [P], s[INITIALIZING], unassigned_info[[reason=ALLOCATION_FAILED], at[2015-10-20T06:06:33.558Z], details[shard failure [engine failure, reason [refresh failed]][FileNotFoundException[/usr/local/elasticsearch-1.7.1/data/elasticsearch/nodes/0/indices/logstash-iis-2015.10.20/3/index/_1pn_Lucene410_0.dvd (Too many open files)]]]], indexUUID [Io8dB0N2Q9CTEmL4UlVmFQ], reason [shard failure [failed recovery][IndexShardGatewayRecoveryException[[logstash-iis-2015.10.20][3] failed recovery]; nested: EngineCreationFailureException[[logstash-iis-2015.10.20][3] failed to open reader on writer]; nested: FileSystemException[/usr/local/elasticsearch-1.7.1/data/elasticsearch/nodes/0/indices/logstash-iis-2015.10.20/3/index/_1h7.si: Too many open files]; ]]

4.[2015-10-20 14:06:33,731][WARN ][indices.cluster ] [Green Goblin] [[logstash-iis-2015.10.20][3]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [logstash-iis-2015.10.20][3] failed recovery
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:162) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: [logstash-iis-2015.10.20][3] failed to open reader on writer
at org.elasticsearch.index.engine.InternalEngine.createSearcherManager(InternalEngine.java:201)
at org.elasticsearch.index.engine.InternalEngine.(InternalEngine.java:146)

... 3 more

Caused by: java.nio.file.FileSystemException: /usr/local/elasticsearch-1.7.1/data/elasticsearch/nodes/0/indices/logstash-iis-2015.10.20/3/index/_1h7.si: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)

at java.nio.channels.FileChannel.open(FileChannel.java:287)

5.[2015-10-20 14:06:33,730][DEBUG][action.bulk ] [Green Goblin] [logstash-iis-2015.10.20][3] failed to execute bulk item (index) index {[logstash-iis-2015.10.20][iis][AVCD2zj5662l9GX22n5O], source[{"EventReceivedTime":"2015-10-20 14:05:31","SourceModuleName":"iis_7","SourceModuleType":"im_file","date":"2015-10-20","time":"06:05:26","s-ip":"192.168.10.202","cs-method":"GET","cs-uri-stem":"/Scripts/PublicFunction.js","cs-uri-query":"v=20150109","cs-username":"sunshunxiangn","c-ip":"180.173.165.156","csUser-Agent":"Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/31.0.1650.63+Safari/537.36","cs-Referer":","sc-status":304,"sc-substatus":0,"sc-win32-status":0,"sc-bytes":211,"cs-bytes":2381,"time-taken":51,"ErrorMessage":"IIS Log","Trace":"2015-10-20 06:05:26 192.168.10.202 GET /Scripts/PublicFunction.js v=20150109 sunshunxiang@ 180.173.165.156 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/31.0.1650.63+Safari/537.36 oms.cn 304 0 0 211 2381 51","EventTime":"2015-10-20 06:05:26","@version":"1","@timestamp":"2015-10-20T06:05:31.099Z","host":"192.168.10.202","type":"iis","item":"oms.cn","cs-host":"oms.n"}]}
org.elasticsearch.index.engine.CreateFailedEngineException: [logstash-iis-2015.10.20][3] Create failed for [iis#AVCD2zj5662l9GX22n5O]

at java.lang.Thread.run(Thread.java:745)

Caused by: java.io.FileNotFoundException: /usr/local/elasticsearch-1.7.1/data/elasticsearch/nodes/0/indices/logstash-iis-2015.10.20/3/index/_1wt.fdx (Too many open files)
at java.io.FileOutputStream.open0(Native Method)


(Magnus Bäck) #4

Too many open files

Your open file limit it too low. What's your current limit (see the first link below to find out what it is)?


(system) #5