Hi,
Elasticsearch 5.3 service installed in our production server stopped by itself and after looking at the log files, i found below information. since the query is requesting 85580 records, all shards failed and elasticsearch shut itself down? if that's the reason, how can i fix this issue ? i looked at "max_result_window" and does setting the size to 10k will help ?
[2017-11-30T13:34:09,019][WARN ][r.suppressed ] path: /elasticsearchlive/searchentry/_search, params: {index=elasticsearchlive, type=searchentry}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onInitialPhaseResult(AbstractSearchAsyncAction.java:223) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.access$100(AbstractSearchAsyncAction.java:58) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:148) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:51) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1032) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1134) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1112) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.transport.TransportService$7.onFailure(TransportService.java:629) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.onFailure(ThreadContext.java:598) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:39) [elasticsearch-5.3.0.jar:5.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
Caused by: org.elasticsearch.transport.RemoteTransportException: [web1][10.100.6.2:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.query.QueryPhaseExecutionException: Result window is too large, from + size must be less than or equal to: [10000] but was [85580]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.
at org.elasticsearch.search.DefaultSearchContext.preProcess(DefaultSearchContext.java:202) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.search.query.QueryPhase.preProcess(QueryPhase.java:90) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.search.SearchService.createContext(SearchService.java:480) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:444) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:331) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:328) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:618) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:613) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.3.0.jar:5.3.0]
... 3 more
[2017-11-30T15:31:03,379][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [web1] fatal error in thread [elasticsearch[web1][search][T#25]], exiting
java.lang.StackOverflowError: null
at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1309) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1309) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1309) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1309) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1309) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1309) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1309) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1309) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1309) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1309) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1309) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
I think we two event here. The first one happened [2017-11-30T13:34:09,019] and is just a WARN. This warn means that you are trying to paginate too deep. Typically when parameters from + size used exceed 10.000 (by default). If you want to retrieve 85580 documents maybe you will interested in the scroll api or search after.
The second event happened about two hours latter [2017-11-30T15:31:03,379] and caused the crash. It's not clear for me what might have cause this fatal error. Please, if you have more evidences feel free to post here and maybe we can determine the root cause.
@luiz.santos : apart from the events I posted, i don't have any other evidences. Same issue happened in other production server this week. is there a way find out why Elasticsearch service is crashing ?
is your cluster suffering of out of memory?
Our production server is using SSD drive with 120GB of memory and we allocated 30GB to Elasticsearch.
Yes, I did follow Production guides. we never had this issue when using Elasticsearch 1.7.1. After upgrading elasticsearch to 5.3, i started seeing this issue.
Please let me know if you need more information.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.