Hi all,
I am running a single instance of elastic stack in version 6.2.3 on a single server (8 CPU, 128 GB RAM).
When I am loading a complex kibana dashboard for 30 days I have following issues:
-
kibana (where I query the dashboard) is running to timeout (60s). If I wait some time (a minute more or so) and retry the query, data is loaded. I assume that the original query is still running and that results are cached then.
-
while the long query is running, a lot of other sessions are are getting the following exception:
Error: Request to Elasticsearch failed: { "error": { "root_cause": [], "type": "search_phase_execution_exception", "reason": "", "phase": "fetch", "grouped": true, "failed_shards": [], "caused_by": { "type": "es_rejected_execution_exception", "reason": "rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@4fedd8e5 on QueueResizingEsThreadPoolExecutor[name = node-1/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 3.5ms, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@87195e1[Running, pool size = 7, active threads = 7, queued tasks = 1039, completed tasks = 91357857]]" } }, "status": 503 }
Is my understanding correct, that the queue is full and therefore no new requests can be buffered?
My elasticsearch.yml changes compared to default look like this:
# threadpool configuration
thread_pool.search.queue_size: 5000
# tuning
processors: 4
# xpack configuration
xpack.security.enabled: false
In xpack monitoring I can see that the search latency is increasing and that cpu usage is at max:
OK, one step will be increasing the cpu limit, because metricbeat on the server as sum shows that I am only using 54%. So logstash will not take the rest of it and limiting to 6 or 7 processors should work.
But what I don't understand is the error message about the full queue:
name = node-1/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000
When I check with GET _nodes I can see the following snips:
"settings": {
...
thread_pool": {
"search": {
"queue_size": "5000"
}
},
...
"thread_pool": {
...
"search": {
"type": "fixed_auto_queue_size",
"min": 7,
"max": 7,
"queue_size": 5000
},
So why 1000 in exception and not 5000?
How do I increase the queue limit for node-1/search?
Any other ideas for tuning?
Thanks a lot, Andreas