Adjusting threadpool queue size to reduce number rof rejected operations

michele_crudele · May 10, 2019, 10:03pm

I'm looking for suggested values of threadpools possibly based on deployments size. More specifically, is there any suggested configuration for threadpools for a deployment like this?

6 nodes Elasticsearch 5.6 deployment with these jvm mem characteristics:

        "mem": {
          "heap_init_in_bytes": 6685720576,
          "heap_max_in_bytes": 6407192576,
          "non_heap_init_in_bytes": 2555904,
          "non_heap_max_in_bytes": 0,
          "direct_max_in_bytes": 6407192576
        },

All nodes have role data and ingest.

I'm asking because we are start getting es_rejection_exception for several operations (mostly bulk, index, and search), which are related to threadpool queue size. This is how threadpools are configured on each of the 6 nodes:

      "thread_pool": {
        "force_merge": {
          "type": "fixed",
          "min": 1,
          "max": 1,
          "queue_size": -1
        },
        "fetch_shard_started": {
          "type": "scaling",
          "min": 1,
          "max": 48,
          "keep_alive": "5m",
          "queue_size": -1
        },
        "listener": {
          "type": "fixed",
          "min": 10,
          "max": 10,
          "queue_size": -1
        },
        "index": {
          "type": "fixed",
          "min": 24,
          "max": 24,
          "queue_size": 200
        },
        "refresh": {
          "type": "scaling",
          "min": 1,
          "max": 10,
          "keep_alive": "5m",
          "queue_size": -1
        },
        "generic": {
          "type": "scaling",
          "min": 4,
          "max": 128,
          "keep_alive": "30s",
          "queue_size": -1
        },
        "warmer": {
          "type": "scaling",
          "min": 1,
          "max": 5,
          "keep_alive": "5m",
          "queue_size": -1
        },
        "search": {
          "type": "fixed",
          "min": 37,
          "max": 37,
          "queue_size": 1000
        },
        "flush": {
          "type": "scaling",
          "min": 1,
          "max": 5,
          "keep_alive": "5m",
          "queue_size": -1
        },
        "fetch_shard_store": {
          "type": "scaling",
          "min": 1,
          "max": 48,
          "keep_alive": "5m",
          "queue_size": -1
        },
        "management": {
          "type": "scaling",
          "min": 1,
          "max": 5,
          "keep_alive": "5m",
          "queue_size": -1
        },
        "get": {
          "type": "fixed",
          "min": 24,
          "max": 24,
          "queue_size": 1000
        },
        "bulk": {
          "type": "fixed",
          "min": 24,
          "max": 24,
          "queue_size": 200
        },
        "snapshot": {
          "type": "scaling",
          "min": 1,
          "max": 5,
          "keep_alive": "5m",
          "queue_size": -1
        }
      }

Is there any room for improvement by expanding the threadpools? Thanks in advance.

Christian_Dahlqvist · May 11, 2019, 6:15am

How many indices and shards do you have in the cluster? How much data?

michele_crudele · May 11, 2019, 10:04am

Actually we have ~10 indices for ~30gb data. However, the number of indices and amount of data is increasing. This is what I see querying threadpool (/_cat/thread_pool?v), which I don't like honestly:

node_name            name                     active queue rejected
node_name_xxxxxxx_17 bulk                     0     0    12659 REJECTED
node_name_xxxxxxx_17 fetch_shard_started      0     0        0
node_name_xxxxxxx_17 fetch_shard_store        0     0        0
node_name_xxxxxxx_17 flush                    0     0        0
node_name_xxxxxxx_17 force_merge              0     0        0
node_name_xxxxxxx_17 generic                  0     0        0
node_name_xxxxxxx_17 get                      0     0        0
node_name_xxxxxxx_17 index                    0     0      229 REJECTED
node_name_xxxxxxx_17 listener                 0     0        0
node_name_xxxxxxx_17 management               1     0        0
node_name_xxxxxxx_17 refresh                  0     0        0
node_name_xxxxxxx_17 search                   0     0     2575 REJECTED
node_name_xxxxxxx_17 snapshot                 0     0        0
node_name_xxxxxxx_17 warmer                   0     0        0
node_name_xxxxxxx_3  bulk                     0     0    24433 REJECTED
node_name_xxxxxxx_3  fetch_shard_started      0     0        0
node_name_xxxxxxx_3  fetch_shard_store        0     0        0
node_name_xxxxxxx_3  flush                    0     0        0
node_name_xxxxxxx_3  force_merge              0     0        0
node_name_xxxxxxx_3  generic                  0     0        0
node_name_xxxxxxx_3  get                      0     0        0
node_name_xxxxxxx_3  index                    0     0        0
node_name_xxxxxxx_3  listener                 0     0        0
node_name_xxxxxxx_3  management               1     0        0
node_name_xxxxxxx_3  refresh                  0     0        0
node_name_xxxxxxx_3  search                   0     0        0
node_name_xxxxxxx_3  snapshot                 0     0        0
node_name_xxxxxxx_3  warmer                   0     0        0
node_name_xxxxxxx_18 bulk                     0     0     2322 REJECTED
node_name_xxxxxxx_18 fetch_shard_started      0     0        0
node_name_xxxxxxx_18 fetch_shard_store        0     0        0
node_name_xxxxxxx_18 flush                    0     0        0
node_name_xxxxxxx_18 force_merge              0     0        0
node_name_xxxxxxx_18 generic                  0     0        0
node_name_xxxxxxx_18 get                      0     0        0
node_name_xxxxxxx_18 index                    0     0        0
node_name_xxxxxxx_18 listener                 0     0        0
node_name_xxxxxxx_18 management               1     0        0
node_name_xxxxxxx_18 refresh                  0     0        0
node_name_xxxxxxx_18 search                   0     0      507 REJECTED
node_name_xxxxxxx_18 snapshot                 0     0        0
node_name_xxxxxxx_18 warmer                   0     0        0
node_name_xxxxxxx_12 bulk                     0     0    39731 REJECTED
node_name_xxxxxxx_12 fetch_shard_started      0     0        0
node_name_xxxxxxx_12 fetch_shard_store        0     0        0
node_name_xxxxxxx_12 flush                    0     0        0
node_name_xxxxxxx_12 force_merge              0     0        0
node_name_xxxxxxx_12 generic                  0     0        0
node_name_xxxxxxx_12 get                      0     0        0
node_name_xxxxxxx_12 index                    0     0     1931 REJECTED
node_name_xxxxxxx_12 listener                 0     0        0
node_name_xxxxxxx_12 management               1     0        0
node_name_xxxxxxx_12 refresh                  0     0        0
node_name_xxxxxxx_12 search                   0     0    82330 REJECTED
node_name_xxxxxxx_12 snapshot                 0     0        0
node_name_xxxxxxx_12 warmer                   0     0        0
node_name_xxxxxxx_2  bulk                     0     0        0
node_name_xxxxxxx_2  fetch_shard_started      0     0        0
node_name_xxxxxxx_2  fetch_shard_store        0     0        0
node_name_xxxxxxx_2  flush                    0     0        0
node_name_xxxxxxx_2  force_merge              0     0        0
node_name_xxxxxxx_2  generic                  0     0        0
node_name_xxxxxxx_2  get                      0     0        0
node_name_xxxxxxx_2  index                    0     0     1703 REJECTED
node_name_xxxxxxx_2  listener                 0     0        0
node_name_xxxxxxx_2  management               1     0        0
node_name_xxxxxxx_2  refresh                  0     0        0
node_name_xxxxxxx_2  search                   0     0        0
node_name_xxxxxxx_2  snapshot                 0     0        0
node_name_xxxxxxx_2  warmer                   0     0        0
node_name_xxxxxxx_3  bulk                     0     0     5452 REJECTED
node_name_xxxxxxx_3  fetch_shard_started      0     0        0
node_name_xxxxxxx_3  fetch_shard_store        0     0        0
node_name_xxxxxxx_3  flush                    0     0        0
node_name_xxxxxxx_3  force_merge              0     0        0
node_name_xxxxxxx_3  generic                  0     0        0
node_name_xxxxxxx_3  get                      0     0        0
node_name_xxxxxxx_3  index                    2     0       87 REJECTED
node_name_xxxxxxx_3  listener                 0     0        0
node_name_xxxxxxx_3  management               1     0        0
node_name_xxxxxxx_3  refresh                  0     0        0
node_name_xxxxxxx_3  search                   0     0      667 REJECTED
node_name_xxxxxxx_3  snapshot                 0     0        0
node_name_xxxxxxx_3  warmer                   0     0        0

Christian_Dahlqvist · May 11, 2019, 11:29am

The number of shards queried or indexed into together with the number of concurrent operations determine how many tasks that need to be queued up. Reducing the number of shards and indices therefore tends to result in less rejections due to full queues. Increasing the queue sizes does not solve the problem but is rather a bandaid that increase resource usage.

michele_crudele · May 11, 2019, 11:49am

Thanks Christian.
I understand that increasing the queue sizes is not the solution. However, what I'm trying to understand is whether the value that is actually configured is the optimal one, based on the deployment size and the number of rejected operations I posted. What do you think?

Based on the deployment I posted, which is the number of shards per index you'd suggest?
Consider that we cannot reduce the number of index.

Christian_Dahlqvist · May 11, 2019, 12:25pm

What is your use-case? Are you using time-based indices? How many of these indices are you actively indexing into? What does the projected data growth look like?

michele_crudele · May 11, 2019, 12:57pm

Frequent, concurrent updates one document at a time using update by script with dynamic compilation (I know, that's bad... we will remove dynamic compilation soon)
If that can be useful, our indexes use parent-child relation so, parent and child are routed to the same shard (it may be a coincidence, but we started observing rejected exceptions after introducing that capability. I suspect that's because mostly of the updates are sent to the same shard, and then to the same node?)
Our indexes are not time-based
All our indexes are actively indexed and searched
On projected data growth, the number of indices may double, but the data should growth very smooth on the base I provided in my previous posts. Sorry, I cannot be more precise on that.

Christian_Dahlqvist · May 11, 2019, 2:15pm

10 indices for 30GB of data sounds like a lot, and the average shard size should be quite small. At that scale it sounds like you should have a single primary shard per index. If you have more and expected growth permits I would recommend using the shrink index API to reduce the number of primary shards per index to 1.

michele_crudele · May 11, 2019, 5:17pm

There are a few indices with size > 10gb of data and that are updated more frequently. The other are smaller and less updated.
With one shard per index, wouldn't all operations go to one node only, so exacerbating the queue size problems?
At what size should we consider to add another shard to the node?

Christian_Dahlqvist · May 11, 2019, 5:23pm

The ideal shard count also depends on the number of nodes that you have.

Christian_Dahlqvist · May 11, 2019, 5:36pm

It would be much more efficient if you did bulk updates, which probably also would reduce queue pressure.

michele_crudele · May 11, 2019, 7:31pm

We have 6 nodes (I posted the characteristics of the deployment in my first post). I was thinking to have 1 shard per node (as I read from several articles), in order to split operations among nodes and relieving the problem of the queue size. So, 6 primary shards + 1 replica. Would you recommend that configuration? Since this change requires re-indexing I'm trying to figure out whether it is a good move. That's why I'm also interesting in the queue size option. Do you have a recommendation on the queue size based on the information I provided?

Christian_Dahlqvist · May 12, 2019, 6:59am

If you are sending in a single update per request changing the shard count might not help as one queue space will be required per document on the node holding the primary shard. In this case increasing the queue size might not be too bad as the amount of data held per queue slot is likely to be low. The longer the queue the longer it may take to complete aach operation.

Be aware that not using bulk updates will hurt performance and update throughput.

michele_crudele · May 12, 2019, 10:15am

Thanks. So, is it reasonable to double the bulk queue size from 200 to 400 for the deployment I posted?

Christian_Dahlqvist · May 12, 2019, 11:10am

I do not know. Test and see.

system · June 9, 2019, 11:19am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bulk thread pool rejections Elasticsearch	5	6558	July 6, 2017
Best practice for thread pool queue size Elasticsearch	3	2057	July 6, 2017
Thread pool configuration Elasticsearch	12	1046	March 6, 2020
Which nodes handle the search threadpool Elasticsearch	12	1280	December 9, 2016
How to find the thread pool size of an ElasticSearch cluster? Elasticsearch	5	491	July 6, 2017

Adjusting threadpool queue size to reduce number rof rejected operations

Related topics