Elasticsearch cluster [6.8] becomes unresponsive for a small duration when one of the nodes in the cluster does not respond to any requests & is not part of the cluster. Is this the expected behaviour?

srivatsa · September 29, 2022, 4:25am

Hi Folks,
The observation in our ES cluster (6.8 version) consisting of 40 nodes is that when any one of the nodes becomes unresponsive, especially in scenarios where the thread pools on the machine are full & rejection is happening.

Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@653a8352 on QueueResizingEsThreadPoolExecutor[name = 15.1.23.14/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 12.4ms, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@7ff8f3d8[Running, pool size = 13, active threads = 13, queued tasks = 1003, completed tasks = 255488]]
	at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:48) ~[elasticsearch-6.8.11.jar:6.8.11]
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_292]
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) ~[?:1.8.0_292]
	at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:98) ~[elasticsearch-6.8.11.jar:6.8.11]
	... 58 more

during this time while the node is unresponsive, its also not able to communicate with the master node & the node is out of the cluster (after 2 mins). during this time our monitoring systems are not able to get any data from the cluster nodes as well as any query we run is not being responded by the cluster.

Want to understand if this behavior is when a single node is unresponsive, the cluster not being able to serve requests is expected behavior, or its a misconfiguration.

Christian_Dahlqvist · September 29, 2022, 5:34am

Version 6.8.11 is very old and EOL. A lot of improvements around resiliency and stability have been made in more recent versions so I would recommend that you upgrade as soon as possible.

srivatsa · October 4, 2022, 4:45am

Hi Christian,
we are planning to upgrade to elasticsearch version 7.17.4, will that help with the issue we are facing?
Are there any fixes done regarding the stability issues similar to the one we are facing.

Christian_Dahlqvist · October 4, 2022, 11:27am

A lot of improvements to cluster stability has been made in the Elasticsearch 7.x range, so I would expect it to behave better. I do not have any list of specific changes though.

DavidTurner · October 4, 2022, 11:54am

Maybe one of the issues linked from here?

github.com/elastic/elasticsearch

Fix Large Shard Count Scalability Issues

opened 05:44AM - 09 Sep 21 UTC

original-brownbear

>bug release highlight Meta :Distributed/Distributed :Security/Authorization :Data Management/Other Team:Data Management Team:Distributed Team:Security

This meta issue tracks known issues with scaling clusters to large numbers of sh…ards. - Security - [ ] https://github.com/elastic/elasticsearch/issues/67987 - [x] https://github.com/elastic/elasticsearch/issues/79632 - General - [ ] https://github.com/elastic/elasticsearch/issues/51992 - [ ] https://github.com/elastic/elasticsearch/issues/87555 - [x] https://github.com/elastic/elasticsearch/pull/77546 - [ ] https://github.com/elastic/elasticsearch/issues/79563 - [x] https://github.com/elastic/elasticsearch/pull/79793 - [x] https://github.com/elastic/elasticsearch/issues/79906 - [x] https://github.com/elastic/elasticsearch/pull/80064 - [ ] https://github.com/elastic/elasticsearch/issues/80694 - [ ] https://github.com/elastic/elasticsearch/issues/81626 - [x] https://github.com/elastic/elasticsearch/issues/81627 - [x] https://github.com/elastic/elasticsearch/issues/81628 - [ ] https://github.com/elastic/elasticsearch/issues/81846 - [x] https://github.com/elastic/elasticsearch/issues/81880 - [x] https://github.com/elastic/elasticsearch/issues/82337 - [x] https://github.com/elastic/elasticsearch/issues/82342 - [x] https://github.com/elastic/elasticsearch/pull/82608 - [x] https://github.com/elastic/elasticsearch/pull/82227 - [x] https://github.com/elastic/elasticsearch/pull/82830 - [ ] https://github.com/elastic/elasticsearch/issues/83049 - [ ] https://github.com/elastic/elasticsearch/issues/83203 - [x] https://github.com/elastic/elasticsearch/issues/83204 -> #85380 - [x] https://github.com/elastic/elasticsearch/issues/85839 - [x] https://github.com/elastic/elasticsearch/issues/86639 - [x] https://github.com/elastic/elasticsearch/issues/87681 - [ ] https://github.com/elastic/elasticsearch/issues/90631 - Snapshots + SLM - [x] https://github.com/elastic/elasticsearch/pull/80942 - [ ] https://github.com/elastic/elasticsearch/issues/82824 - [ ] https://github.com/elastic/elasticsearch/issues/82853 - [x] https://github.com/elastic/elasticsearch/issues/82937 - [x] https://github.com/elastic/elasticsearch/pull/88707 - [x] https://github.com/elastic/elasticsearch/issues/88732 - [ ] https://github.com/elastic/elasticsearch/issues/89163 - [ ] https://github.com/elastic/elasticsearch/issues/89952 - Metrics - [x] https://github.com/elastic/elasticsearch/issues/80428 - ILM + Allocation - [x] https://github.com/elastic/elasticsearch/pull/78547 - [x] #78235 - [x] https://github.com/elastic/elasticsearch/issues/78246 - [x] #78075 - [x] #77965 - [x] https://github.com/elastic/elasticsearch/pull/77855 - [x] https://github.com/elastic/elasticsearch/pull/77863 - [x] https://github.com/elastic/elasticsearch/pull/78742 - [x] https://github.com/elastic/elasticsearch/pull/78668 - [x] https://github.com/elastic/elasticsearch/pull/78672 - [x] https://github.com/elastic/elasticsearch/pull/78609 - [x] https://github.com/elastic/elasticsearch/pull/78745 - [x] https://github.com/elastic/elasticsearch/pull/78813 - [ ] https://github.com/elastic/elasticsearch/issues/78892 - [x] https://github.com/elastic/elasticsearch/pull/80493 - [x] https://github.com/elastic/elasticsearch/issues/77888 - [x] https://github.com/elastic/elasticsearch/pull/78931 - [x] https://github.com/elastic/elasticsearch/pull/78969 - [x] https://github.com/elastic/elasticsearch/issues/78980 - [x] https://github.com/elastic/elasticsearch/issues/79782 - [x] https://github.com/elastic/elasticsearch/issues/79866 - [x] https://github.com/elastic/elasticsearch/pull/79860 - [x] https://github.com/elastic/elasticsearch/pull/79941 - [x] https://github.com/elastic/elasticsearch/pull/80179 - [ ] https://github.com/elastic/elasticsearch/issues/80407 - [x] https://github.com/elastic/elasticsearch/issues/81880 - [x] https://github.com/elastic/elasticsearch/pull/82251 - [x] https://github.com/elastic/elasticsearch/issues/82708 - [x] https://github.com/elastic/elasticsearch/pull/83092 - [x] https://github.com/elastic/elasticsearch/pull/83241 - [x] https://github.com/elastic/elasticsearch/pull/83340 - [x] https://github.com/elastic/elasticsearch/issues/83582 - [x] https://github.com/elastic/elasticsearch/pull/84034 - [ ] https://github.com/elastic/elasticsearch/issues/89924 - Search - [x] https://github.com/elastic/elasticsearch/issues/74648 - [x] https://github.com/elastic/elasticsearch/issues/78164 - [x] https://github.com/elastic/elasticsearch/pull/76405 - [x] https://github.com/elastic/elasticsearch/pull/77131 - [x] https://github.com/elastic/elasticsearch/pull/77201 - [x] https://github.com/elastic/elasticsearch/pull/77251 - [x] https://github.com/elastic/elasticsearch/issues/78314 - [x] https://github.com/elastic/elasticsearch/pull/78508 - [x] https://github.com/elastic/elasticsearch/issues/82879 - [ ] https://github.com/elastic/elasticsearch/issues/89309 - Network - [x] https://github.com/elastic/elasticsearch/issues/82245 - [ ] https://github.com/elastic/elasticsearch/issues/79560 - [ ] https://github.com/elastic/elasticsearch/pull/83846 - [ ] https://github.com/elastic/elasticsearch/issues/84876 - [ ] https://github.com/elastic/elasticsearch/issues/84887

That said, there's no point in putting effort in to work out which one might explain your specific problem when using such an old version. You need to upgrade first and then if you continue to have problems we can dig deeper.

system · November 1, 2022, 11:55am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster node unresponsive after search Elasticsearch	2	662	July 5, 2017
New Elasticsearch 7.6.0 cluster eventually becomes unresponsive Elasticsearch	3	369	April 13, 2020
Cluster Becomes Unresponsive for 90 Sec After Data Node Leaves Elasticsearch	2	808	March 3, 2017
ES cluster becomes unresponsive Elasticsearch	2	696	July 6, 2017
Any way to exclude not responding node from running ES cluster? Elasticsearch	3	1262	July 6, 2017

Elasticsearch cluster [6.8] becomes unresponsive for a small duration when one of the nodes in the cluster does not respond to any requests & is not part of the cluster. Is this the expected behaviour?

Related topics