Elasticsearch cluster [6.8] becomes unresponsive for a small duration when one of the nodes in the cluster does not respond to any requests & is not part of the cluster. Is this the expected behaviour?

Hi Folks,
The observation in our ES cluster (6.8 version) consisting of 40 nodes is that when any one of the nodes becomes unresponsive, especially in scenarios where the thread pools on the machine are full & rejection is happening.

Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@653a8352 on QueueResizingEsThreadPoolExecutor[name = 15.1.23.14/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 12.4ms, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@7ff8f3d8[Running, pool size = 13, active threads = 13, queued tasks = 1003, completed tasks = 255488]]
	at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:48) ~[elasticsearch-6.8.11.jar:6.8.11]
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_292]
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) ~[?:1.8.0_292]
	at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:98) ~[elasticsearch-6.8.11.jar:6.8.11]
	... 58 more

during this time while the node is unresponsive, its also not able to communicate with the master node & the node is out of the cluster (after 2 mins). during this time our monitoring systems are not able to get any data from the cluster nodes as well as any query we run is not being responded by the cluster.

Want to understand if this behavior is when a single node is unresponsive, the cluster not being able to serve requests is expected behavior, or its a misconfiguration.

Version 6.8.11 is very old and EOL. A lot of improvements around resiliency and stability have been made in more recent versions so I would recommend that you upgrade as soon as possible.

1 Like

Hi Christian,
we are planning to upgrade to elasticsearch version 7.17.4, will that help with the issue we are facing?
Are there any fixes done regarding the stability issues similar to the one we are facing.

A lot of improvements to cluster stability has been made in the Elasticsearch 7.x range, so I would expect it to behave better. I do not have any list of specific changes though.

Maybe one of the issues linked from here?

That said, there's no point in putting effort in to work out which one might explain your specific problem when using such an old version. You need to upgrade first and then if you continue to have problems we can dig deeper.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.