Hey All,
I am running three node cluster setup with ES 2.2.0. Each is 32 core 64GB instance and ES has around 17GB of RAM allocated. I have around 350million records. All my three nodes are performing very badly with all possible kind of exception.
Few of them are below
-
ProcessClusterEventTimeoutException[failed to process cluster event (put-mapping [myindex-2016-05-01T12]) within 30s]
-
Caused by: QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: OutOfMemoryError[Java heap space];
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:409)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:113)
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:364)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:376)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:368)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:365)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:350)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.automaton.RunAutomaton.(RunAutomaton.java:144)
at org.apache.lucene.util.automaton.ByteRunAutomaton.(ByteRunAutomaton.java:32)
at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:247)
at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:133)
at org.apache.lucene.search.FuzzyTermsEnum.initAutomata(FuzzyTermsEnum.java:175)
at org.apache.lucene.search.FuzzyTermsEnum.getAutomatonEnum(FuzzyTermsEnum.java:151)
at org.apache.lucene.search.FuzzyTermsEnum.maxEditDistanceChanged(FuzzyTermsEnum.java:210)
at org.apache.lucene.search.FuzzyTermsEnum.bottomChanged(FuzzyTermsEnum.java:204)
at org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:142)
at org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:155)
at org.apache.lucene.search.MultiTermQuery.getTermsEnum(MultiTermQuery.java:318) -
01-May-2016 12:25:41,293 INFO [transport] (elasticsearch[Dan Ketch][generic][T#2238]) [Dan Ketch] failed to get local cluster state for {#transport#-2}{17.30.25.25}{17.30.25.25:9300}, disconnecting...: ReceiveTimeoutTransportException[[][17.30.25.25:9300][cluster:monitor/state] request_id [40958] timed out after [15000ms]]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:645) [elasticsearch-2.2.0.jar:2.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_72]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_72]
at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_72]
-
4. java.util.concurrent.TimeoutException: Failed to acknowledge mapping update within [30s]
5. [2016-05-01 12:29:01,806][WARN ][transport ] [Node2] Received response for a request that has timed out, sent [17789ms] ago, timed out [2788ms] ago, action [cluster:monitor/nodes/stats[n]], node [{Node0}{HQbDpWZ7RcGIOEoKslkR2Q}{17.30.25.25}{17.30.25.25:9300}{master=true}], id [369038]
And I have below setting in elasticsearch,
---------------------------------- Cache Size --------------------------------
indices.fielddata.cache.size: 70%
indices.breaker.fielddata.limit: 75%
---------------------------------- Thread pool --------------------------------
threadpool.index.queue_size: 2000
threadpool.search.queue_size: 2000
threadpool.bulk.queue_size: 2000
bootstrap.mlockall: true
indices.store.throttle.max_bytes_per_sec: 100mb
I am aggregating some index for last 24hours records every 10min since new data will be added every min.
Can someone please suggest what's going wrong here. I see OOm bcz of aggregation but how can I avoid this?
Regards,