Running queue tasks more than max queue capacity

version 6.3.2, elasticsearch, version 6.3.2, has 5 master nodes and 15 data nodes.
The following problems occur in the log:

Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: 
rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@1d9f2200 on QueueResizingEsThreadPoolExecutor
[name = xxxxx-data/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 284.8micros, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@1f60aa97
[Running, pool size = 49, active threads = 49, queued tasks = 56999, completed tasks = 336035534]]

the running queue tasks is more than max queue capacity.
In this case, the_cat/nodesoperation times out:
collector [node_stats] timed out when collecting data

Elasticsearch 6.X is EOL, please upgrade as a matter of urgency.

Basically this is saying your cluster is overloaded. What is the output from the _cluster/stats?pretty&human API?

sorry, the cluster is green health now. and I do not keep the _cluster/stats information.

but when I run _cluster/health?pretty on master node when the cluster is problem, it show green health. and I run any rest api on data node ,it show timeout.

Even if the front-end traffic is limited to very low levels, the cluster logs still throw timeout and queue exceptions.Finally, all indexes are closed first, and then open, it is found that the log does not throw this error, and gradually open the front-end traffic limit, the cluster is still normal.

There are two people who want to confirm:

  1. Check whether queue tasks in running are true requests. After restarting the cluster, there are still a large number of queue exceptions in this log.

  2. Will the queue be cleared by the close index?

It's stored in the cluster, you can query at any time.

{
"_nodes" : {
"total" : 20,
"successful" : 20,
"failed" : 0
},
"cluster_name" : "cluster_name2",
"timestamp" : 1665727685194,
"status" : "green",
"indices" : {
"count" : 99,
"shards" : {
"total" : 632,
"primaries" : 281,
"replication" : 1.2491103202846976,
"index" : {
"shards" : {
"min" : 2,
"max" : 40,
"avg" : 6.383838383838384
},
"primaries" : {
"min" : 1,
"max" : 10,
"avg" : 2.8383838383838382
},
"replication" : {
"min" : 1.0,
"max" : 3.0,
"avg" : 1.2626262626262625
}
}
},
"docs" : {
"count" : 2038649028,
"deleted" : 651890076
},
"store" : {
"size" : "1.5tb",
"size_in_bytes" : 1718697492594
},
"fielddata" : {
"memory_size" : "60.6kb",
"memory_size_in_bytes" : 62120,
"evictions" : 0
},
"query_cache" : {
"memory_size" : "0b",
"memory_size_in_bytes" : 0,
"total_count" : 0,
"hit_count" : 0,
"miss_count" : 0,
"cache_size" : 0,
"cache_count" : 0,
"evictions" : 0
},
"completion" : {
"size" : "0b",
"size_in_bytes" : 0
},
"segments" : {
"count" : 6392,
"memory" : "5.7gb",
"memory_in_bytes" : 6166553349,
"terms_memory" : "5.3gb",
"terms_memory_in_bytes" : 5782436211,
"stored_fields_memory" : "276.3mb",
"stored_fields_memory_in_bytes" : 289747000,
"term_vectors_memory" : "0b",
"term_vectors_memory_in_bytes" : 0,
"norms_memory" : "2mb",
"norms_memory_in_bytes" : 2189888,
"points_memory" : "75.9mb",
"points_memory_in_bytes" : 79617346,
"doc_values_memory" : "11.9mb",
"doc_values_memory_in_bytes" : 12562904,
"index_writer_memory" : "4.3mb",
"index_writer_memory_in_bytes" : 4535992,
"version_map_memory" : "0b",
"version_map_memory_in_bytes" : 0,
"fixed_bit_set" : "287.6kb",
"fixed_bit_set_memory_in_bytes" : 294576,
"max_unsafe_auto_id_timestamp" : 1665705603283,
"file_sizes" : { }
}
},
"nodes" : {
"count" : {
"total" : 20,
"data" : 15,
"coordinating_only" : 0,
"master" : 5,
"ingest" : 20
},
"versions" : [
"6.3.2"
],
"os" : {
"available_processors" : 640,
"allocated_processors" : 640,
"names" : [
{
"name" : "Linux",
"count" : 20
}
],
"mem" : {
"total" : "3.6tb",
"total_in_bytes" : 4028666773504,
"free" : "370.8gb",
"free_in_bytes" : 398206930944,
"used" : "3.3tb",
"used_in_bytes" : 3630459842560,
"free_percent" : 10,
"used_percent" : 90
}
},
"process" : {
"cpu" : {
"percent" : 143
},
"open_file_descriptors" : {
"min" : 1260,
"max" : 1815,
"avg" : 1520
}
},
"jvm" : {
"max_uptime" : "11.6d",
"max_uptime_in_millis" : 1005554496,
"versions" : [
{
"version" : "1.8.0_192",
"vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
"vm_version" : "25.192-b12",
"vm_vendor" : "Oracle Corporation",
"count" : 20
}
],
"mem" : {
"heap_used" : "272.4gb",
"heap_used_in_bytes" : 292565225120,
"heap_max" : "616.2gb",
"heap_max_in_bytes" : 661707816960
},
"threads" : 6562
},
"fs" : {
"total" : "54.5tb",
"total_in_bytes" : 59982473994240,
"free" : "46.6tb",
"free_in_bytes" : 51261142790144,
"available" : "46.6tb",
"available_in_bytes" : 51261142790144
},
"plugins" : [
{
"name" : "analysis-jieba",
"version" : "6.3.2",
"elasticsearch_version" : "6.3.2",
"java_version" : "1.8",
"description" : "A jieba analysis of plugins for Elasticsearch",
"classname" : "org.elasticsearch.plugin.analysis.jieba.AnalysisJiebaPlugin",
"extended_plugins" : ,
"has_native_controller" : false
},
{
"name" : "analysis-ik",
"version" : "6.3.2",
"elasticsearch_version" : "6.3.2",
"java_version" : "1.8",
"description" : "IK Analyzer for Elasticsearch",
"classname" : "org.elasticsearch.plugin.analysis.ik.AnalysisIkPlugin",
"extended_plugins" : ,
"has_native_controller" : false
}
],
"network_types" : {
"transport_types" : {
"security4" : 20
},
"http_types" : {
"security4" : 20
}
}
}
}

The preceding information is about the _cluster/stats?pretty&human . and I had add 5 data nodes when cluster throw the timeout error.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.