Hi Folks,
The observation in our ES cluster (6.8 version) consisting of 40 nodes is that when any one of the nodes becomes unresponsive, especially in scenarios where the thread pools on the machine are full & rejection is happening.
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@653a8352 on QueueResizingEsThreadPoolExecutor[name = 15.1.23.14/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 12.4ms, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@7ff8f3d8[Running, pool size = 13, active threads = 13, queued tasks = 1003, completed tasks = 255488]]
at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:48) ~[elasticsearch-6.8.11.jar:6.8.11]
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_292]
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) ~[?:1.8.0_292]
at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:98) ~[elasticsearch-6.8.11.jar:6.8.11]
... 58 more
during this time while the node is unresponsive, its also not able to communicate with the master node & the node is out of the cluster (after 2 mins). during this time our monitoring systems are not able to get any data from the cluster nodes as well as any query we run is not being responded by the cluster.
Want to understand if this behavior is when a single node is unresponsive, the cluster not being able to serve requests is expected behavior, or its a misconfiguration.
Version 6.8.11 is very old and EOL. A lot of improvements around resiliency and stability have been made in more recent versions so I would recommend that you upgrade as soon as possible.
Hi Christian,
we are planning to upgrade to elasticsearch version 7.17.4, will that help with the issue we are facing?
Are there any fixes done regarding the stability issues similar to the one we are facing.
A lot of improvements to cluster stability has been made in the Elasticsearch 7.x range, so I would expect it to behave better. I do not have any list of specific changes though.
That said, there's no point in putting effort in to work out which one might explain your specific problem when using such an old version. You need to upgrade first and then if you continue to have problems we can dig deeper.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.