Search rejected execution on all indexes

jveldboom · June 11, 2015, 5:47pm

We've been having an ongoing issue with our Elasticsearch server. Seemingly once a week it becomes unresponsive to search queries. (other queries seem to work fine - like _mapping). Below is a small snippet of one of the lines that continue to be added. It seems to add this for every index we have.

What puzzles me is that the logs report on indexes that are not being used. So why would a query fail to execute? (stld_song_search_dev is a development index).

Any help is greatly appreciated as this is our production server!

[2015-06-11 13:17:08,198][DEBUG][action.search.type       ] [search1] [stld_song_search_dev][2], node[oY5aYQSLQWOpVyS_DqeOkQ], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@561fc007] lastShard [true]
    org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@398541f8
    	at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:62)
    	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
    	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
    	at org.elasticsearch.search.action.SearchServiceTransportAction.execute(SearchServiceTransportAction.java:551)
    	at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:228)
    	at org.elasticsearch.action.search.type.TransportSearchCountAction$AsyncAction.sendExecuteFirstPhase(TransportSearchCountAction.java:71)
    	at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:176)
    	at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:158)
    	at org.elasticsearch.action.search.type.TransportSearchCountAction.doExecute(TransportSearchCountAction.java:55)
    	at org.elasticsearch.action.search.type.TransportSearchCountAction.doExecute(TransportSearchCountAction.java:45)
    	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
    	at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108)
    	at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
    	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
    	at org.elasticsearch.action.search.TransportMultiSearchAction.doExecute(TransportMultiSearchAction.java:62)
    	at org.elasticsearch.action.search.TransportMultiSearchAction.doExecute(TransportMultiSearchAction.java:39)
    	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
    	at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:98)
    	at org.elasticsearch.client.support.AbstractClient.multiSearch(AbstractClient.java:364)
    	at org.elasticsearch.action.search.MultiSearchRequestBuilder.doExecute(MultiSearchRequestBuilder.java:78)
    	at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:91)
    	at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:65)
    	at org.elasticsearch.action.ActionRequestBuilder.get(ActionRequestBuilder.java:73)
    	at org.elasticsearch.search.suggest.phrase.PhraseSuggester.fetchMatchingDocCountResponses(PhraseSuggester.java:184)
    	at org.elasticsearch.search.suggest.phrase.PhraseSuggester.collate(PhraseSuggester.java:147)
    	at org.elasticsearch.search.suggest.phrase.PhraseSuggester.innerExecute(PhraseSuggester.java:108)
    	at org.elasticsearch.search.suggest.phrase.PhraseSuggester.innerExecute(PhraseSuggester.java:53)
    	at org.elasticsearch.search.suggest.Suggester.execute(Suggester.java:42)
    	at org.elasticsearch.search.suggest.SuggestPhase.execute(SuggestPhase.java:85)
    	at org.elasticsearch.search.suggest.SuggestPhase.execute(SuggestPhase.java:74)
    	at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:170)
    	at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:272)
    	at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:283)
    	at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:231)
    	at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:228)
    	at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    	at java.lang.Thread.run(Thread.java:745)

I believe the issue is caused by the search threadpool queue size filling up. But I'm just not able to see why this is happening, how to debug the issue, and fix it.
http://imgur.com/OyRRaFz

warkolm · June 12, 2015, 5:47am

Full threadpools means your cluster is overloaded, what sort of monitoring do you have in place?

jveldboom · June 12, 2015, 12:46pm

We're monitoring the system with Graphite so we have full cpu, memory, disk etc. All those seem very low and steady.

We also recently installed Marvel to help debug this issue and it show the same low load except for a sudden spike in the search thread pool.

I'm assuming that spike is caused by some kind of application request, but is there any way to prevent the Elasticsearch node from completely becoming unresponsive during the spike? It seems to never recover.

Thanks for any help!

warkolm · June 13, 2015, 4:32am

You really need to find out what is generating these searches.
Try turning the slowlog down to catch what is running and link it back from there.

jveldboom · June 14, 2015, 1:50am

Thanks for your help. This was our next step. We currently have have slowlog on search like the following, but it's not catching anything.

index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms

I'll try lowering the trace to 200ms and see if we can narrow down the queries causing the issues.

On a side note, we did make some application changes to prevent certain suggest queries to run all the time. We're using the following to do "did you mean?" type suggestions.

{
  "size": 0,
  "suggest": {
    "text": "search keyword not found",
    "simple_phrase": {
      "phrase": {
        "field": "title",
        "size": 1,
        "highlight": {
          "pre_tag": "<em>",
          "post_tag": "</em>"
        },
        "collate": {
          "query": {
            "match": {
              "{{field_name}}": "{{suggestion}}"
            }
          },
          "params": {
            "field_name": "title"
          },
          "preference": "_primary"
        }
      }
    }
  }
}

These types queries seem to be pretty expensive since they take on average 250ms to return. And from what I've seen with Elasticsearch, that's a long time. (usually under 30ms)

warkolm · June 14, 2015, 2:35am

250ms is still pretty quick

ravitandur · October 16, 2015, 11:19am

What is the index pattern u r using? and is documents added continuously?

Topic		Replies	Views
EsRejectedExecutionException Elasticsearch	4	419	July 6, 2017
EsRejectedExecutionException: rejected execution (queue capacity 1000) Elasticsearch	7	4582	July 6, 2017
Rejected execution Elasticsearch	9	6277	August 11, 2020
Es_rejected_execution exception with index failure changing Elasticsearch	7	1402	February 20, 2019
Will "EsRejectedExecutionException: rejected execution" lead to a data loss? Elasticsearch	1	997	July 5, 2017

Search rejected execution on all indexes

Related topics