Thread Pool Configuration - Max thread_pool.size?

Hi All,

Kindly help me with my query

What could be the best possible configuration for a data-node running on a server having configuration

lscpu output>

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 28
On-line CPU(s) list: 0-27
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 14
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
Stepping: 1
CPU MHz: 2097.691
BogoMIPS: 4195.23
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 40960K
NUMA node0 CPU(s): 0-27

Current Settings are

thread_pool:
  write:
    size: 24
  search:
    size: 85
    queue_size: 500
    min_queue_size: 10
    max_queue_size: 1000
    auto_queue_frame_size: 2000
    target_response_time: 1s

and there are very high search queries on this server

node_name           name   active queue 
data-node-1         search     85    51 

because of which queue is getting filled up that is ok
as our max_queue_size is 1000
but es is rejecting the requests after the queue has reached 51 why so? is it because of target_response_time?
and what could be a max value for thread_pool.search.size?

Before giving any recommendations I have some questions:

  • How many nodes do you have in the cluster?

  • Which version of Elasticsearch are you using?

  • What is the full output from the cluster stats API?

  • How many concurrent queries are you expecing to need to support?

We have 11 nodes in total

5 data nodes
3 coordincating nodes
3 master nodes

Version 6.5.4

GET /_cluster/stats >

{
"_nodes": {
"total": 11,
"successful": 11,
"failed": 0
},
"cluster_name": "blackhole",
"cluster_uuid": "GEbMwYZ1Q32OWJFzEdTQSA",
"timestamp": 1586948033165,
"status": "green",
"indices": {
"count": 110,
"shards": {
"total": 1004,
"primaries": 502,
"replication": 1,
"index": {
"shards": {
"min": 2,
"max": 10,
"avg": 9.127272727272727
},
"primaries": {
"min": 1,
"max": 5,
"avg": 4.5636363636363635
},
"replication": {
"min": 1,
"max": 1,
"avg": 1
}
}
},
"docs": {
"count": 367659960,
"deleted": 50731783
},
"store": {
"size_in_bytes": 1119225068695
},
"fielddata": {
"memory_size_in_bytes": 23710584,
"evictions": 0
},
"query_cache": {
"memory_size_in_bytes": 4513623739,
"total_count": 637231173997,
"hit_count": 143510379985,
"miss_count": 493720794012,
"cache_size": 291378,
"cache_count": 3239636308,
"evictions": 3239344930
},
"completion": {
"size_in_bytes": 0
},
"segments": {
"count": 11170,
"memory_in_bytes": 13853151308,
"terms_memory_in_bytes": 13708166460,
"stored_fields_memory_in_bytes": 73873832,
"term_vectors_memory_in_bytes": 0,
"norms_memory_in_bytes": 24145472,
"points_memory_in_bytes": 18040064,
"doc_values_memory_in_bytes": 28925480,
"index_writer_memory_in_bytes": 0,
"version_map_memory_in_bytes": 0,
"fixed_bit_set_memory_in_bytes": 164822672,
"max_unsafe_auto_id_timestamp": -1,
"file_sizes": {}
}
},
"nodes": {
"count": {
"total": 11,
"data": 5,
"coordinating_only": 3,
"master": 3,
"ingest": 0
},
"versions": [
"6.5.4"
],
"os": {
"available_processors": 200,
"allocated_processors": 200,
"names": [
{
"name": "Linux",
"count": 11
}
],
"mem": {
"total_in_bytes": 641698430976,
"free_in_bytes": 99411136512,
"used_in_bytes": 542287294464,
"free_percent": 15,
"used_percent": 85
}
},
"process": {
"cpu": {
"percent": 412
},
"open_file_descriptors": {
"min": 512,
"max": 3753,
"avg": 1741
}
},
"jvm": {
"max_uptime_in_millis": 24200534975,
"versions": [
{
"version": "1.8.0_201",
"vm_name": "Java HotSpot(TM) 64-Bit Server VM",
"vm_version": "25.201-b09",
"vm_vendor": "Oracle Corporation",
"count": 11
}
],
"mem": {
"heap_used_in_bytes": 167080555032,
"heap_max_in_bytes": 346575339520
},
"threads": 2920
},
"fs": {
"total_in_bytes": 3156332773376,
"free_in_bytes": 2000615645184,
"available_in_bytes": 2000615645184
},
"plugins": ,
"network_types": {
"transport_types": {
"security4": 11
},
"http_types": {
"security4": 11
}
}
}
}

How many concurrent queries can our elastic cluster serve at it best?

It looks like you have a lot of small shards. How many of these does each query typically target?

1 Like

Each query target 5 shards at a time

Does number of Shards increase number of threads being used by a search query?
is there any relation between number of threads and number of shards?

Each shard queried is put on the queue, so large number of concurrent queries or large number of shards queried can fill up the queues. Given the number of CPU cores you have a available and the size of the data, it is quite likely that your query performance is limited by disk I/O. Increasing queue sizes will not improve this, just allow you to queue more data for longer on the nodes.

What type of storage are you using?

1 Like

Storage is SSD

How many concurrent queries can our elastic cluster serve with the current server?

should we increase thread_pool.search.size? to server more requests will that help?

I do not know what type of queries or data you have and do not know what is currently limiting performance so it is hard to tell. I would not recommend altering the threadpool sizes unless you really know what you are doing as the defaults generally are quite good.

Even though you have local SSDs, look at disk I/O and iowait just to be sure this is not a bottleneck. Also check if you are seeing long or frequent GC. Then look at the load on the coordinating only nodes and check these are not limiting.

1 Like

The issue only occurs when there is a sudden increase in the number of queries
and i get this exception
rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@6fbf2782 on QueueResizingEsThreadPoolExecutor[name = data-node-3/search, queue capacity = 50, min queue capacity = 10, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 255.6ms, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@329b63d9[Running, pool size = 85, active threads = 85, queued tasks = 52, completed tasks = 4333428590]]

in this queued tasks are only 52 why so? it should go up to 1000?

You should remove your adjustments to the default thread pool configuration. The default behaviour is the recommended one and will get the best out of your cluster. If you are having performance problems (e.g. rejections) then the solution is not to adjust the thread pool configuration.

2 Likes

Thank you for all the information. This will be really helpful :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.