Elasticsearch 2.3.3 encountered outofmemory

Hi,
we run 4 elk nodes in cluter using jre 1.8.0.77,logstash 2.3.2,elasticsearch 2.3.2 ,kibana 4.5.1.and there are totally about more than three hundred client server transfering linux log,windows event log and iis log to ELK cluster.
the following is our architecture and H/W configuration:

but now we encoutered a critial error, logstash on ELK01 host is receiving and processing logs from client servers. every few days,the elasticsearch.log will log "OutOfMemory" exception,and quit from cluster,at the same time, I can not login OS through ssh remotely,so I have to force to reboot OS.
Could anybody help to fix it? thanks.

[2016-07-25 01:22:35,370][WARN ][index.engine ] [elk04] [it_p5sfcs_iislog-2016.07.24][0] failed engine [refresh failed]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Unknown Source)
at org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:517)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1931)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:455)
at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:286)
at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:261)
at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:251)
at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:104)
at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:137)
at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:154)
at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176)
at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253)
at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:672)
at org.elasticsearch.index.shard.IndexShard.refresh(IndexShard.java:661)
at org.elasticsearch.index.shard.IndexShard$EngineRefresher$1.run(IndexShard.java:1349)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

How much data in your cluster?

@warkolm
we configured number_of_replicas is 1, so totally is 50GB per day,

Ok, how many all together though?
How many indices, how many shards?

@warkolm
every day will create 25 indices(each index owns 5 primary shards and 5 replicas shards),and keep past one month history indices.
so now ,we have 787 indices and 7872 shards in our cluster.

I'd say you are massively oversharded and that is creating heap pressure.

@warkolm
based our ELK cluster H/W,Could you have the advice about how many indices and shards?
or How can I know the optimum amount values of indices and shards.

ELK04 : HP DL580 Gen5
ELK01/02/03: HP DL380 Gen5

Aim for shard size <50GB.

@warkolm

  1. is it right that one index(5 primary shards+5 replicas shards) size should be less than 500GB?
  2. May I configure logstash output plugins to create index by week?,if yes,could you give me the date pattern?

Sounds about right.

Consult the documentation

@warkolm @DiscussBuster
Really very thank you ,I will try to modify logstash configuration to create indices by week to reduce the amount of shards,then check the effect.:grinning: