Hi
We have a elasticsearch cluster (version 7.8.1) using 7 nodes, every node with 96 cpu core and 192 GB RAM.
During a stress test process, using locust software and simulating 1150 concurrent users, every one run a _search request asking for the first 1000 documents of an index. We got 230 requests/second, but, some of them are failing:
-
locust tool gives me error "HTTPError('429 Client Error: Too Many Requests for url: http://xxxxx:9200/indexname/_search')"
-
using kibana during the stress test (many different complex panels) gives me error:
rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@76c9a58e on
QueueResizingEsThreadPoolExecutor[name = xxxxx/search, queue capacity = 1000, min
queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate
= 1s, task execution EWMA = 242.6ms, adjustment amount = 50,
org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@50977828[Running,
pool size = 145, active threads = 145, queued tasks = 1000, completed tasks = 5575837]]
I've searching a bit about this error and how to fix it and it seems related to the parameter:
thread_pool.search.max_queue_size
Asking for my cluster configuration gives me values:
GET /_cluster/settings?include_defaults=true
....
"search" : {
"max_queue_size" : "1000",
"queue_size" : "1000",
"size" : "145",
"auto_queue_frame_size" : "2000",
"target_response_time" : "1s",
"min_queue_size" : "1000"
},
I changed the threadpool.search.max_queue_size parameter in the elasticsearch.yml config file on all nodes (tried first using a POST to the _cluster/settings but gave me an error). I see the value changed in the _cluster/settings, but it also gives me a warning about the parameter is deprecated. Anyway, I tried the same stress tests but gives me the same errors when running 1150 concurrent users.
Running less concurrent users, there are no errors. And running more than 1150 users, it gives me errors with more frequency.
Anyone can help me with this?
Thanks