Elasticsearch: changing thread_pool.search.max_queue_size on a high load cluster

Hi

We have a elasticsearch cluster (version 7.8.1) using 7 nodes, every node with 96 cpu core and 192 GB RAM.

During a stress test process, using locust software and simulating 1150 concurrent users, every one run a _search request asking for the first 1000 documents of an index. We got 230 requests/second, but, some of them are failing:

  • locust tool gives me error "HTTPError('429 Client Error: Too Many Requests for url: http://xxxxx:9200/indexname/_search')"

  • using kibana during the stress test (many different complex panels) gives me error:

       rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@76c9a58e on 

       QueueResizingEsThreadPoolExecutor[name = xxxxx/search, queue capacity = 1000, min 
       queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate
        = 1s, task execution EWMA = 242.6ms, adjustment amount = 50,
        
       org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@50977828[Running, 
       pool size = 145, active threads = 145, queued tasks = 1000, completed tasks = 5575837]]

I've searching a bit about this error and how to fix it and it seems related to the parameter:

thread_pool.search.max_queue_size

Asking for my cluster configuration gives me values:

       GET /_cluster/settings?include_defaults=true

       ....

      "search" : {
        "max_queue_size" : "1000",
        "queue_size" : "1000",
        "size" : "145",
        "auto_queue_frame_size" : "2000",
        "target_response_time" : "1s",
        "min_queue_size" : "1000"
      },

I changed the threadpool.search.max_queue_size parameter in the elasticsearch.yml config file on all nodes (tried first using a POST to the _cluster/settings but gave me an error). I see the value changed in the _cluster/settings, but it also gives me a warning about the parameter is deprecated. Anyway, I tried the same stress tests but gives me the same errors when running 1150 concurrent users.
Running less concurrent users, there are no errors. And running more than 1150 users, it gives me errors with more frequency.

Anyone can help me with this?

Thanks

How much data does each node hold? Is every search targeting the full data set? How many primary and replica shards do you have configured? What type of storage are you using? Local SSDs?

  • The index I'm searching on has 6.4 Million documents

  • The query asks only to node 1, asks for 1000 documents

  • The shards are:

xxxx 1 r STARTED 1052623 385.1mb 10.57.40.38  node02
xxxx 1 p STARTED 1052623 381.7mb 10.57.40.33  node01
xxxx 2 p STARTED 1052209 381.9mb 10.57.40.102 node05
xxxx 2 r STARTED 1052209 384.2mb 10.57.40.77  node04
xxxx 0 p STARTED 1052570 380.9mb 10.57.40.64  node03
xxxx 0 r STARTED 1052570 383.4mb 10.57.40.77  node04
  • nodes 1-5 are dedicated to 'hot' configuration and 6,7 are for cold indexes, this is why there are no shards for this index in nodes 6 or 7

  • Storage is remote (iscsi). Nodes 1-5 are using 4 GB of a total 3 TB. Nodes 6-7 are using 1 TB of a total 20 TB, although they don't hold this index currently

I also tried asking for only 100 documents instead of 1000, and then I can simulate more than 1150 users, tried 2000 users and no errors returned for exemple (which seems logic by the way).

What I see is I'm reaching the pool size limit, the configuration parameter seems to be deprecated (although the documentation doesn't say it), maybe I have to use another parameter?

You have provided info for one index which looks small. You mention hot and cold nodes which makes me believe you might have time based indices. If that is the case, now many indices will there be? How many of these would be queried at once?

Can you describe the use case you are optimising for? What would a production setting and load look like? Optimising for query concurrency for a single index might not be valid if you later intend to query more indices.

Hi

As I said, the index I'm querying it's located on nodes 1-5.

The only thing I'm testing, is a _search operation for that index, and asking for the first 1000 documents. These searches are run using locust tool, which runs the query on 1150 different threads

That's so simple as that :slight_smile:

For that use case I would change the index to have one primary shard and 5 replica shards configured. That should improve query throughput. That should allow 6 nodes to serve queries based on a single shard and the full data set should be cached.

Ok that's nice. But what I'm trying to do is changing the elasticsearch limits, remember the error message:

    QueueResizingEsThreadPoolExecutor[name = xxxxx/search, queue capacity = 1000, min 
       queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate
        = 1s, task execution EWMA = 242.6ms, adjustment amount = 50,

There must be a way to change this 1000 limit...

I would recommend first optimising your indexing strategy as I outlined. Having 3 primary shards and 1 replica for that data volume is very inefficient. I believe this will solve your problem. You should avoid altering the queue and thread pool settings.

To me it sounds like you are trying to optimise the wrong thing.

Why?

These are expert level settings and can have side effects. I would recommend optimising the use case and sharing first.

Based on your description of your cluster it is clearly oversized if you are only going to have this single index in it. As you have a hot-cold setup it sounds like you will be using time based indices. If that is the case there is IMHO opinion no point in optimizing and changing advanced settings based on the test you are running as the cluster will behave differently when there is a lot more data in the cluster and more concurrent queries and indexing going on. In your current test your entire dataset can be cached, which gives a certain performance profile. If the node is full of data it is likely that disk performance becomes the limiting factor meaning it need to be optimized differently. I would instead recommend optimizing your sharding and then benchmark with the amount of data and load the cluster is likely to be under in production.

I would also recommend reading this blog.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.