Hello, everyone. This cluster has been running for a long time. Today pending tasks pile up too many tasks. Check the master log and find the following warn message. The master waits for the response of the bdes225 node after sending ClusterState. Until the master waits for the timeout.
I checked the log of bdes225 and found the following errors.
I have a doubt that I have not solved.
1、fatal error in thread [elasticsearch[bdes225-prd1][clusterService#updateTask][T#5]], exiting
After the log is printed, the Elastic search UncaughtExceptionHandler:: halt method is executed, but the JVM does not exit.
Is this stock Elasticsearch or have you modified it?
Thanks, I've found the cause of the problem. But in this case, the thread will go down or the JVM will exit.
These errors were reported for several hours and the pending tasks piled up tens of thousands, seriously affecting the production environment.
I don't think it should affect the whole cluster, so I mentioned a PR.
Our cluster metadata operation is particularly frequent. The operation of metadata is single-thread serial execution and finally passes the allocation module. I optimized the logic of this part, modified the source code and compiled it.
The reason for the problem is that I replaced the original jar, but for special reasons this node process did not restart, I reproduce this problem in the test environment. Not a normal problem, sorry.
Right, so as suspected you've modified Elasticsearch and broke it. We can not support that.
The correct behavior for Elasticsearch when something that we can not recover from happens, is for the process to die. That is what the uncaught exception handler does. A
NoClassDefFoundError is unrecoverable.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.