Rejected execution during search

Hi all,

I am running a single instance of elastic stack in version 6.2.3 on a single server (8 CPU, 128 GB RAM).

When I am loading a complex kibana dashboard for 30 days I have following issues:

  • kibana (where I query the dashboard) is running to timeout (60s). If I wait some time (a minute more or so) and retry the query, data is loaded. I assume that the original query is still running and that results are cached then.

  • while the long query is running, a lot of other sessions are are getting the following exception:

    Error: Request to Elasticsearch failed: 
    {
      "error": {
      	"root_cause": [],
      	"type": "search_phase_execution_exception",
      	"reason": "",
      	"phase": "fetch",
      	"grouped": true,
      	"failed_shards": [],
      	"caused_by": {
      		"type": "es_rejected_execution_exception",
      		"reason": "rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@4fedd8e5 on QueueResizingEsThreadPoolExecutor[name = node-1/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 3.5ms, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@87195e1[Running, pool size = 7, active threads = 7, queued tasks = 1039, completed tasks = 91357857]]"
      	}
      },
      "status": 503
    }
    

Is my understanding correct, that the queue is full and therefore no new requests can be buffered?

My elasticsearch.yml changes compared to default look like this:

# threadpool configuration
thread_pool.search.queue_size: 5000

# tuning
processors: 4

# xpack configuration
xpack.security.enabled: false

In xpack monitoring I can see that the search latency is increasing and that cpu usage is at max:

OK, one step will be increasing the cpu limit, because metricbeat on the server as sum shows that I am only using 54%. So logstash will not take the rest of it and limiting to 6 or 7 processors should work.

But what I don't understand is the error message about the full queue:

name = node-1/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000

When I check with GET _nodes I can see the following snips:

"settings": {
    ...
    thread_pool": {
      "search": {
        "queue_size": "5000"
      }
    },

    ...

"thread_pool": {

    ...

    "search": {
          "type": "fixed_auto_queue_size",
          "min": 7,
          "max": 7,
          "queue_size": 5000
    },

So why 1000 in exception and not 5000?
How do I increase the queue limit for node-1/search?

Any other ideas for tuning?

Thanks a lot, Andreas

is it possible to update processors directive on runtime?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.